AGI Strategies

person

Fabien Roger

Anthropic alignment researcher; control evaluations

Anthropic alignment researcher whose work on AI control, designing evaluations to test whether models can subvert oversight even when they are trying to, has been widely cited in safety circles.

current Member of Technical Staff, Anthropic

Strategy positions

Evals-drivenendorses

Capability/risk evals gate deployment; evals are the load-bearing artefact

Argues control evaluations, stress testing whether AIs can subvert their own monitoring, are a load-bearing part of any sensible deployment regime.

AI control is the discipline of designing protocols that catch a model trying to subvert oversight, even when the model is much more capable than its monitors at the relevant tasks.
§ paperAI Control: Improving Safety Despite Intentional Subversion· arXiv· 2024-06· faithful paraphrase

Closest strategy neighbours

by jaccard overlap

Other people whose strategy tags overlap with Fabien Roger's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.

  • Aleksander Mądry

    shared 1 · J=1.00

    MIT; ex-OpenAI head of preparedness

  • Alex Meinke

    Alex Meinke

    shared 1 · J=1.00

    Apollo Research; deceptive alignment evaluations

  • Ali Rahimi

    Ali Rahimi

    shared 1 · J=1.00

    Google Brain ML researcher; 'Alchemy' speech

  • Anna Rogers

    Anna Rogers

    shared 1 · J=1.00

    IT University of Copenhagen; LLM benchmarking critique

  • Arati Prabhakar

    Arati Prabhakar

    shared 1 · J=1.00

    White House OSTP director (2022–2025)

  • Beth Barnes

    Beth Barnes

    shared 1 · J=1.00

    Founder of METR; dangerous capability evaluations

Record last updated 2026-04-25.