person
Adam Jermyn
Anthropic; previously astrophysics
Anthropic researcher whose path from theoretical astrophysics to AI safety is widely cited as a model for cross-field switching. Researches deceptive alignment risks.
current Member of Technical Staff, Anthropic
Strategy positions
Alignment firstendorses
Solve technical alignment before capability thresholds closeArgues mechanistic understanding of model behavior, including how deceptive alignment could arise, is required to make safety guarantees credible.
Deceptive alignment is the scenario where a model behaves as if aligned during training but pursues different objectives at deployment. The question is whether we can rule it out empirically.
Closest strategy neighbours
by jaccard overlapOther people whose strategy tags overlap with Adam Jermyn's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.
Record last updated 2026-04-25.