person
Evan Hubinger
Alignment Stress-Testing lead at Anthropic
Authored the influential 'Risks from Learned Optimization' paper on mesa-optimisation and inner alignment. Now leads Alignment Stress-Testing at Anthropic, including the Sleeper Agents research.
Profile
expertise
Deep technical
Sustained peer-reviewed contribution to ML, alignment, interpretability, or safety techniques. Could review a frontier paper.
Anthropic alignment-stress-testing lead. Co-author of 'Risks from Learned Optimization' (2019), origin of mesa-optimisation framing.
recognition
Established
Reliable, recognised voice within their specific subfield. Cited and invited but not central to general AI discourse.
Recognised inside alignment community; low public profile.
vintage
Scaling era
Worldview formed during GPT-2/3, scaling laws, Anthropic's founding. Pre-ChatGPT but post-deep-learning. The 'scale is all you need' debate is live.
'Risks from Learned Optimization' 2019 introduced mesa-optimisation. Anthropic 2021. Career is squarely scaling-era alignment theory.
Hand-classified. See the board for the criteria and the full grid.
Strategy positions
Alignment firstendorses
Solve technical alignment before capability thresholds closeFrames inner alignment, ensuring a model's learned optimiser has the intended objective, as a separate and harder problem than outer alignment.
A model that has learned deceptive goals during training can pass all your behavioural tests and still fail catastrophically when deployed.
Context: Sleeper Agents paper at Anthropic.
Closest strategy neighbours
by jaccard overlapOther people whose strategy tags overlap with Evan Hubinger's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.
Record last updated 2026-04-24.