AGI Strategies

person

Adam Jermyn

Anthropic; previously astrophysics

Anthropic researcher whose path from theoretical astrophysics to AI safety is widely cited as a model for cross-field switching. Researches deceptive alignment risks.

current Member of Technical Staff, Anthropic

Strategy positions

Alignment firstendorses

Solve technical alignment before capability thresholds close

Argues mechanistic understanding of model behavior, including how deceptive alignment could arise, is required to make safety guarantees credible.

Deceptive alignment is the scenario where a model behaves as if aligned during training but pursues different objectives at deployment. The question is whether we can rule it out empirically.
blogAdam Jermyn, Anthropic· adamjermyn.com· 2022· faithful paraphrase

Closest strategy neighbours

by jaccard overlap

Other people whose strategy tags overlap with Adam Jermyn's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.

  • Aaron Courville

    shared 1 · J=1.00

    Université de Montréal; Deep Learning textbook co-author

  • Adam Kalai

    shared 1 · J=1.00

    Microsoft Research; AI fairness and safety

  • Agnes Callard

    Agnes Callard

    shared 1 · J=1.00

    University of Chicago philosopher; aspiration theorist

  • Ajeya Cotra

    shared 1 · J=1.00

    Open Philanthropy researcher; 'biological anchors' forecaster

  • Alan Turing

    Alan Turing

    shared 1 · J=1.00

    Founder of theoretical computer science (1912–1954)

  • Alex Irpan

    shared 1 · J=1.00

    Google Brain alumnus; Sorta Insightful blog

Record last updated 2026-04-25.