person

Anna Rogers

IT University of Copenhagen; LLM benchmarking critique

IT University of Copenhagen associate professor; vocal critic of how LLM benchmarks are constructed and reported. Frequent NLP community commentator on contamination, leaderboard inflation, and method hygiene.

current Associate Professor, IT University of Copenhagen

homepage @annargrs

Strategy positions

Evals-drivenmixed

Capability/risk evals gate deployment; evals are the load-bearing artefact

Argues current benchmark practice in NLP is broken, data leakage, opaque test sets, and incentive-driven framing make many headline numbers unreliable.

How much of LLM 'reasoning' is actually pattern matching against contaminated test data? We don't know, and that's a problem for any safety claim that rests on benchmark performance.

✍ blogAnna Rogers, Hai!· hackingsemantics.xyz· 2023· faithful paraphrase

Closest strategy neighbours

by jaccard overlap

Other people whose strategy tags overlap with Anna Rogers's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.

Aleksander Mądry
shared 1 · J=1.00
MIT; ex-OpenAI head of preparedness
Alex Meinke
shared 1 · J=1.00
Apollo Research; deceptive alignment evaluations
Ali Rahimi
shared 1 · J=1.00
Google Brain ML researcher; 'Alchemy' speech
Arati Prabhakar
shared 1 · J=1.00
White House OSTP director (2022–2025)
Beth Barnes
shared 1 · J=1.00
Founder of METR; dangerous capability evaluations
Bo Li
shared 1 · J=1.00
UChicago / UIUC; AI safety evaluations

Record last updated 2026-04-25.