person

Lucius Bushnaq

Apollo Research; mech interp

Senior researcher at Apollo Research; works on mechanistic interpretability and on detecting deceptive cognition in language models.

current Senior Researcher, Apollo Research

@Lucius_Bushnaq

Strategy positions

Interpretability betendorses

Mechanistic interpretability is necessary and sufficient to know models are safe

Argues interpretability tools are most valuable when explicitly designed to detect deceptive or strategic behaviours in models, not just to characterize benign features.

Interpretability that only finds nice features misses the alignment-relevant ones. We need methods designed to surface the deceptive behaviours we are most worried about.

¶ articleApollo Research· Apollo Research· 2024· faithful paraphrase

Closest strategy neighbours

by jaccard overlap

Other people whose strategy tags overlap with Lucius Bushnaq's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.

Asma Ghandeharioun
shared 1 · J=1.00
Google DeepMind; 'Patchscopes' for LLM interpretability
Chris Olah
shared 1 · J=1.00
Anthropic interpretability co-founder; inventor of modern mech interp
Cynthia Rudin
shared 1 · J=1.00
Duke professor; interpretable ML pioneer
David Bau
shared 1 · J=1.00
Northeastern; mechanistic interpretability of LLMs
Fernanda Viégas
shared 1 · J=1.00
Harvard; ex-Google PAIR; data visualization
Jacob Andreas
shared 1 · J=1.00
MIT NLP; language models as belief reports

Record last updated 2026-04-25.