AGI Strategies

person

David Bau

Northeastern; mechanistic interpretability of LLMs

Northeastern University professor whose group has produced widely cited work on locating and editing factual associations in transformer language models (ROME, MEMIT).

current Assistant Professor of Computer Sciences, Northeastern University

Strategy positions

Interpretability betendorses

Mechanistic interpretability is necessary and sufficient to know models are safe

Argues mechanistic interpretability is making rapid progress in localizing and editing knowledge inside transformer weights; views this as a foundation for safety oversight.

“Factual associations in GPT correspond to localized, directly editable computations in mid-layer feed-forward modules.”
§ paperLocating and Editing Factual Associations in GPT· arXiv / NeurIPS· 2022· direct quote

Closest strategy neighbours

by jaccard overlap

Other people whose strategy tags overlap with David Bau's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.

  • Asma Ghandeharioun

    shared 1 · J=1.00

    Google DeepMind; 'Patchscopes' for LLM interpretability

  • Chris Olah

    Chris Olah

    shared 1 · J=1.00

    Anthropic interpretability co-founder; inventor of modern mech interp

  • Cynthia Rudin

    Cynthia Rudin

    shared 1 · J=1.00

    Duke professor; interpretable ML pioneer

  • Fernanda Viégas

    shared 1 · J=1.00

    Harvard; ex-Google PAIR; data visualization

  • Jacob Andreas

    Jacob Andreas

    shared 1 · J=1.00

    MIT NLP; language models as belief reports

  • John Wentworth

    John Wentworth

    shared 1 · J=1.00

    Independent alignment researcher; natural abstractions

Record last updated 2026-04-25.