person

Adam Jermyn

Anthropic; previously astrophysics

Anthropic researcher whose path from theoretical astrophysics to AI safety is widely cited as a model for cross-field switching. Researches deceptive alignment risks.

current Member of Technical Staff, Anthropic

@AdamSJermyn

Strategy positions

Alignment firstendorses

Solve technical alignment before capability thresholds close

Argues mechanistic understanding of model behavior, including how deceptive alignment could arise, is required to make safety guarantees credible.

Deceptive alignment is the scenario where a model behaves as if aligned during training but pursues different objectives at deployment. The question is whether we can rule it out empirically.

✍ blogAdam Jermyn, Anthropic· adamjermyn.com· 2022· faithful paraphrase

Closest strategy neighbours

by jaccard overlap

Other people whose strategy tags overlap with Adam Jermyn's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.

Aaron Courville
shared 1 · J=1.00
Université de Montréal; Deep Learning textbook co-author
Adam Kalai
shared 1 · J=1.00
Microsoft Research; AI fairness and safety
Agnes Callard
shared 1 · J=1.00
University of Chicago philosopher; aspiration theorist
Ajeya Cotra
shared 1 · J=1.00
Open Philanthropy researcher; 'biological anchors' forecaster
Alan Turing
shared 1 · J=1.00
Founder of theoretical computer science (1912–1954)
Alex Irpan
shared 1 · J=1.00
Google Brain alumnus; Sorta Insightful blog

Record last updated 2026-04-25.