person

Chris Olah
Anthropic interpretability co-founder; inventor of modern mech interp
The most-cited mechanistic interpretability researcher. Co-founded the interpretability team at Anthropic that produced circuits, superposition, and monosemanticity work.
Profile
expertise
Frontier builder
Currently or recently led training, architecture, or safety work on a frontier model. Hands on the loss curve.
Founded mechanistic interpretability as a subfield (Distill papers, circuits thread). Anthropic interpretability team lead. Hands-on technical work on frontier model internals.
recognition
Field-leading
Widely known inside the AI and AI-safety community. Appears repeatedly in top venues, podcasts, or governance forums. Not a household name to outsiders.
The reference name in interpretability. Less public profile than CEOs or executives.
vintage
Deep-learning rise
Came up post-AlexNet. ImageNet, AlphaGo, transformer paper. DeepMind, Google Brain, FAIR establish the modern lab template.
Distill papers from 2017; circuits thread 2020. The interpretability subfield he founded is a deep-learning-era artefact.
Hand-classified. See the board for the criteria and the full grid.
Strategy positions
Interpretability betendorses
Mechanistic interpretability is necessary and sufficient to know models are safeFrames mechanistic interpretability as the tool most likely to let us verify whether a model's cognition matches its stated goal.
I'm most optimistic about safety paths that give us some kind of detailed mechanistic understanding of neural networks.
Closest strategy neighbours
by jaccard overlapOther people whose strategy tags overlap with Chris Olah's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.
Record last updated 2026-04-24.