person

Fabien Roger

Anthropic alignment researcher; control evaluations

Anthropic alignment researcher whose work on AI control, designing evaluations to test whether models can subvert oversight even when they are trying to, has been widely cited in safety circles.

current Member of Technical Staff, Anthropic

homepage @FabienDRoger

Strategy positions

Evals-drivenendorses

Capability/risk evals gate deployment; evals are the load-bearing artefact

Argues control evaluations, stress testing whether AIs can subvert their own monitoring, are a load-bearing part of any sensible deployment regime.

AI control is the discipline of designing protocols that catch a model trying to subvert oversight, even when the model is much more capable than its monitors at the relevant tasks.

§ paperAI Control: Improving Safety Despite Intentional Subversion· arXiv· 2024-06· faithful paraphrase

Closest strategy neighbours

by jaccard overlap

Other people whose strategy tags overlap with Fabien Roger's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.

Aleksander Mądry
shared 1 · J=1.00
MIT; ex-OpenAI head of preparedness
Alex Meinke
shared 1 · J=1.00
Apollo Research; deceptive alignment evaluations
Ali Rahimi
shared 1 · J=1.00
Google Brain ML researcher; 'Alchemy' speech
Anna Rogers
shared 1 · J=1.00
IT University of Copenhagen; LLM benchmarking critique
Arati Prabhakar
shared 1 · J=1.00
White House OSTP director (2022–2025)
Beth Barnes
shared 1 · J=1.00
Founder of METR; dangerous capability evaluations

Record last updated 2026-04-25.