person

David Bau

Northeastern; mechanistic interpretability of LLMs

Northeastern University professor whose group has produced widely cited work on locating and editing factual associations in transformer language models (ROME, MEMIT).

current Assistant Professor of Computer Sciences, Northeastern University

homepage @davidbau

Strategy positions

Interpretability betendorses

Mechanistic interpretability is necessary and sufficient to know models are safe

Argues mechanistic interpretability is making rapid progress in localizing and editing knowledge inside transformer weights; views this as a foundation for safety oversight.

“Factual associations in GPT correspond to localized, directly editable computations in mid-layer feed-forward modules.”

§ paperLocating and Editing Factual Associations in GPT· arXiv / NeurIPS· 2022· direct quote

Closest strategy neighbours

by jaccard overlap

Other people whose strategy tags overlap with David Bau's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.

Asma Ghandeharioun
shared 1 · J=1.00
Google DeepMind; 'Patchscopes' for LLM interpretability
Chris Olah
shared 1 · J=1.00
Anthropic interpretability co-founder; inventor of modern mech interp
Cynthia Rudin
shared 1 · J=1.00
Duke professor; interpretable ML pioneer
Fernanda Viégas
shared 1 · J=1.00
Harvard; ex-Google PAIR; data visualization
Jacob Andreas
shared 1 · J=1.00
MIT NLP; language models as belief reports
John Wentworth
shared 1 · J=1.00
Independent alignment researcher; natural abstractions

Record last updated 2026-04-25.