AGI Strategies
← overview

Control mechanism · ai artefact

Alignment first

Technical alignment is solvable before critical capability thresholds close, and aligned systems compose safely into aligned populations.

Mechanism

Invest primarily in interpretability, scalable oversight, and post-training methods so AI does what principals intend.

What this name has meant

vintage drift

The name is stable; the content has shifted. A reader acting on the label without asking which vintage is being meant risks arguing with a position nobody currently holds.

2015

Solving the principal–agent problem for arbitrary specified values. A research agenda against future agentic systems.

2026

Training models to produce helpful, honest, harmless outputs via RLHF and constitutional methods. Current alignment practice absorbs this label.

If it succeeds: what binds next

Aligned frontier AI exists. Principals now choose what to align to, and operator legitimacy becomes the binding constraint. The race is over who gets to be the principal.

A strategy that produces a worse next problem than the one it solved has not done durable work.

Falsification signal

Interpretability and oversight methods stop scaling with model capability, stronger models are less rather than more inspectable.

A strategy held without a falsification signal is not strategy; it is affiliation. Continued support after this signal lands is identity, not bet. See the identity diagnostic.

Self-undermining threshold

overshoot risk

When alignment investment hollows out institutional capacity.

Concentrating talent and funding into alignment produces a shortage of the democratic and institutional capacity an aligned superintelligence would land into. Solved alignment, dysfunctional substrate.

Every strategy has a stable region where it reinforces itself and an unstable region where pursuit defeats it. The threshold between them is usually narrower than advocates acknowledge.

People on the record

103

Profiled figures appear first, with their tier in small caps. Each face links to the person and their full quote record. Tag: alignment-first.

expertise mix · 29 profiled

Builds frontier systems
4
Deep ML / safety technical
18
Applied or adjacent technical
1
Governance, policy, strategy
3
Expert in another field
3
Public-square commentator
0

recognition mix

Mass-public recognition
7
Known across the AI/safety field
11
Recognised inside subfield
11
Newer or less central voice
0

A strategy whose endorsement skews to commentators or external-domain experts is in a different epistemic state from one endorsed mostly by frontier-builders. The mix is read carefully across both axes; see the board for criteria. Counts are over the 29 profiled people on this strategy (74 unprofiled excluded).

  • Ajeya Cotra

    Governance, policy, strategy · Recognised inside subfield

  • Alan Turing

    Alan Turing

    Deep ML / safety technical · Mass-public recognition

  • Anca Dragan

    Builds frontier systems · Known across the AI/safety field

  • Andrew G. Barto

    Deep ML / safety technical · Known across the AI/safety field

  • Brian Christian

    Brian Christian

    Expert in another field · Known across the AI/safety field

  • Buck Shlegeris

    Buck Shlegeris

    Deep ML / safety technical · Recognised inside subfield

  • Claude Shannon

    Claude Shannon

    Deep ML / safety technical · Mass-public recognition

  • Daniel Dewey

    Deep ML / safety technical · Recognised inside subfield

  • Doris Tsao

    Doris Tsao

    Expert in another field · Known across the AI/safety field

  • Evan Hubinger

    Deep ML / safety technical · Recognised inside subfield

  • Iyad Rahwan

    Iyad Rahwan

    Deep ML / safety technical · Known across the AI/safety field

  • Jan Leike

    Jan Leike

    Builds frontier systems · Known across the AI/safety field

  • John McCarthy

    John McCarthy

    Deep ML / safety technical · Mass-public recognition

  • John Schulman

    John Schulman

    Builds frontier systems · Known across the AI/safety field

  • Joseph Carlsmith

    Joseph Carlsmith

    Governance, policy, strategy · Known across the AI/safety field

  • Marvin Minsky

    Marvin Minsky

    Deep ML / safety technical · Mass-public recognition

  • Nick Bostrom

    Nick Bostrom

    Governance, policy, strategy · Mass-public recognition

  • Norbert Wiener

    Norbert Wiener

    Expert in another field · Mass-public recognition

  • Owain Evans

    Deep ML / safety technical · Recognised inside subfield

  • Paul Christiano

    Builds frontier systems · Known across the AI/safety field

  • Richard Ngo

    Richard Ngo

    Deep ML / safety technical · Recognised inside subfield

  • Rob Miles

    Rob Miles

    Applied or adjacent technical · Known across the AI/safety field

  • Rohin Shah

    Rohin Shah

    Deep ML / safety technical · Recognised inside subfield

  • Ryan Greenblatt

    Deep ML / safety technical · Recognised inside subfield

  • Scott Aaronson

    Scott Aaronson

    Deep ML / safety technical · Known across the AI/safety field

  • Stuart Armstrong

    Stuart Armstrong

    Deep ML / safety technical · Recognised inside subfield

  • Stuart Russell

    Stuart Russell

    Deep ML / safety technical · Mass-public recognition

  • Victoria Krakovna

    Victoria Krakovna

    Deep ML / safety technical · Recognised inside subfield

  • Wei Dai

    Wei Dai

    Deep ML / safety technical · Recognised inside subfield

  • Aaron Courville

    Université de Montréal; Deep Learning textbook co-author

  • Adam Jermyn

    Anthropic; previously astrophysics

  • Adam Kalai

    Microsoft Research; AI fairness and safety

  • Agnes Callard

    Agnes Callard

    University of Chicago philosopher; aspiration theorist

  • Alex Irpan

    Google Brain alumnus; Sorta Insightful blog

  • Alex Pan

    Berkeley CHAI; reward hacking

  • Alex Turner

    Alex Turner

    DeepMind alignment researcher; shard theory co-originator

67 more on the record. See the full tag page: alignment-first

Load-bearing commitments

Worldview positions this strategy quietly assumes. If the claim fails empirically or philosophically, the strategy loses its target or its premise.

Values

Principals have determinate values AI can learn.

Fails if: If values are contested or constructed, the strategy loses its target.

AI nature

AI is a tool with controllable properties.

Fails if: If AI has emergent agency, the tool frame fails and alignment becomes negotiation.

Coordinates

Acts onai artefact
Coercionconsent
Actor in controlhumans
Time horizonpre transition
Legitimacy sourcetechnical

Conflicts, grouped by mechanism

0

No strict conflicts catalogued. This strategy pulls a lever that nothing else pulls in the opposite direction.

Complements, grouped by mechanism

5

Cross-side bridge

one AI-side, one world-side

One acts on the model, the other on institutions or culture. The bridge hedges against both artefact-level and substrate-level failure.

Governance firstResilience first

Adjacent bet

different levers, loosely coupled

Different levers, different directions of action. They reinforce only via the general principle that covering more bets dominates covering fewer.

Open source maximalism

Stage-sequenced

one sets up the other

The pair is phase-offset: one acts before the transition, the other during or after. The first creates the conditions under which the second binds.

Cooperative AI

Same phase, different layer

same stage, distinct levers

Both are active in the same phase of the transition but act on different layers (model vs institution vs culture). They cover different failure modes inside the same window.

Pause

Same-lever twins

5

Both use the same lever in the same direction. Usually redundant inside a portfolio: each dollar or effort unit only buys one lever pull, even if two strategies are named.

AI containmenttwinAI for safetytwinCounter AI AItwinInterpretability firsttwinSafe by construction AItwin

Axis position

What the strategy acts onAI artefact
Coercion levelConsent
Actor in controlHumans as principals
Time horizonPre-transition
Legitimacy sourceTechnical

Source note: Alignment first strategy.md