person
David Bau
Northeastern; mechanistic interpretability of LLMs
Northeastern University professor whose group has produced widely cited work on locating and editing factual associations in transformer language models (ROME, MEMIT).
current Assistant Professor of Computer Sciences, Northeastern University
Strategy positions
Interpretability betendorses
Mechanistic interpretability is necessary and sufficient to know models are safeArgues mechanistic interpretability is making rapid progress in localizing and editing knowledge inside transformer weights; views this as a foundation for safety oversight.
“Factual associations in GPT correspond to localized, directly editable computations in mid-layer feed-forward modules.”
Closest strategy neighbours
by jaccard overlapOther people whose strategy tags overlap with David Bau's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.
Record last updated 2026-04-25.