person
Lucius Bushnaq
Apollo Research; mech interp
Senior researcher at Apollo Research; works on mechanistic interpretability and on detecting deceptive cognition in language models.
current Senior Researcher, Apollo Research
Strategy positions
Interpretability betendorses
Mechanistic interpretability is necessary and sufficient to know models are safeArgues interpretability tools are most valuable when explicitly designed to detect deceptive or strategic behaviours in models, not just to characterize benign features.
Interpretability that only finds nice features misses the alignment-relevant ones. We need methods designed to surface the deceptive behaviours we are most worried about.
Closest strategy neighbours
by jaccard overlapOther people whose strategy tags overlap with Lucius Bushnaq's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.
Record last updated 2026-04-25.