person

John Wentworth
Independent alignment researcher; natural abstractions
Independent alignment researcher who developed the 'natural abstractions hypothesis' as a framing for whether human concepts robustly transfer to learned representations.
current Alignment Researcher, Independent
Strategy positions
Interpretability betendorses
Mechanistic interpretability is necessary and sufficient to know models are safeArgues alignment requires identifying the abstractions a model converges on; if these match human concepts, training-time supervision becomes far more reliable.
The natural abstractions hypothesis is roughly: a wide variety of cognitive systems will converge to use the same high-level abstractions for reasoning about the world.
Closest strategy neighbours
by jaccard overlapOther people whose strategy tags overlap with John Wentworth's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.
Record last updated 2026-04-25.