AGI Strategies

person

Asma Ghandeharioun

Google DeepMind; 'Patchscopes' for LLM interpretability

Senior research scientist at Google DeepMind; lead author of Patchscopes, a unifying framework for using language models to inspect their own internal representations.

current Senior Research Scientist, Google DeepMind

Strategy positions

Interpretability betendorses

Mechanistic interpretability is necessary and sufficient to know models are safe

Argues language models can be turned into interpretability tools for themselves; reframes mechanistic interpretation as a translation problem between hidden states and natural language.

“Patchscopes leverage the model's own ability to generate text to inspect its hidden representations, unifying many prior interpretability methods.”
§ paperPatchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models· arXiv / Google DeepMind· 2024· direct quote

Closest strategy neighbours

by jaccard overlap

Other people whose strategy tags overlap with Asma Ghandeharioun's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.

  • Chris Olah

    Chris Olah

    shared 1 · J=1.00

    Anthropic interpretability co-founder; inventor of modern mech interp

  • Cynthia Rudin

    Cynthia Rudin

    shared 1 · J=1.00

    Duke professor; interpretable ML pioneer

  • David Bau

    shared 1 · J=1.00

    Northeastern; mechanistic interpretability of LLMs

  • Fernanda Viégas

    shared 1 · J=1.00

    Harvard; ex-Google PAIR; data visualization

  • Jacob Andreas

    Jacob Andreas

    shared 1 · J=1.00

    MIT NLP; language models as belief reports

  • John Wentworth

    John Wentworth

    shared 1 · J=1.00

    Independent alignment researcher; natural abstractions

Record last updated 2026-04-25.