person

Anna Rogers
IT University of Copenhagen; LLM benchmarking critique
IT University of Copenhagen associate professor; vocal critic of how LLM benchmarks are constructed and reported. Frequent NLP community commentator on contamination, leaderboard inflation, and method hygiene.
current Associate Professor, IT University of Copenhagen
Strategy positions
Evals-drivenmixed
Capability/risk evals gate deployment; evals are the load-bearing artefactArgues current benchmark practice in NLP is broken, data leakage, opaque test sets, and incentive-driven framing make many headline numbers unreliable.
How much of LLM 'reasoning' is actually pattern matching against contaminated test data? We don't know, and that's a problem for any safety claim that rests on benchmark performance.
Closest strategy neighbours
by jaccard overlapOther people whose strategy tags overlap with Anna Rogers's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.
Record last updated 2026-04-25.