AGI Strategies

person

John Wentworth

John Wentworth

Independent alignment researcher; natural abstractions

Independent alignment researcher who developed the 'natural abstractions hypothesis' as a framing for whether human concepts robustly transfer to learned representations.

current Alignment Researcher, Independent

Strategy positions

Interpretability betendorses

Mechanistic interpretability is necessary and sufficient to know models are safe

Argues alignment requires identifying the abstractions a model converges on; if these match human concepts, training-time supervision becomes far more reliable.

The natural abstractions hypothesis is roughly: a wide variety of cognitive systems will converge to use the same high-level abstractions for reasoning about the world.
blogThe Natural Abstraction Hypothesis: Implications and Evidence· LessWrong· 2021· faithful paraphrase

Closest strategy neighbours

by jaccard overlap

Other people whose strategy tags overlap with John Wentworth's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.

  • Asma Ghandeharioun

    shared 1 · J=1.00

    Google DeepMind; 'Patchscopes' for LLM interpretability

  • Chris Olah

    Chris Olah

    shared 1 · J=1.00

    Anthropic interpretability co-founder; inventor of modern mech interp

  • Cynthia Rudin

    Cynthia Rudin

    shared 1 · J=1.00

    Duke professor; interpretable ML pioneer

  • David Bau

    shared 1 · J=1.00

    Northeastern; mechanistic interpretability of LLMs

  • Fernanda Viégas

    shared 1 · J=1.00

    Harvard; ex-Google PAIR; data visualization

  • Jacob Andreas

    Jacob Andreas

    shared 1 · J=1.00

    MIT NLP; language models as belief reports

Record last updated 2026-04-25.