AGI Strategies
← overview

Control mechanism · ai artefact

AI for safety

The same capability that makes AI dangerous makes it uniquely useful for automating alignment research and oversight.

Mechanism

Use AI for automated alignment research, interpretability, evaluation, and scalable oversight of stronger AI by weaker AI.

If it succeeds: what binds next

AI runs alignment research. The question becomes whether the safety-AI itself is aligned, recursing one level.

A strategy that produces a worse next problem than the one it solved has not done durable work.

Coordinates

Acts onai artefact
Coercionconsent
Actor in controlhumans
Time horizonduring transition
Legitimacy sourcetechnical

Conflicts, grouped by mechanism

2

Frame opposition

incompatible premises

The strategies accept different premises about what AI is or what the binding problem is. They conflict not on lever choice but on the frame that makes lever choice sensible.

Decouple reasoning from actionAbandon superintelligence

Complements, grouped by mechanism

4

Same-lever reinforce

same lever, same pull, different mechanism

Both strategies pull the same lever in the same direction by different means. They stack: doing both amplifies the pull, at the cost of double-counting in portfolio audits.

Alignment firstInterpretability firstCounter AI AI

Same phase, different layer

same stage, distinct levers

Both are active in the same phase of the transition but act on different layers (model vs institution vs culture). They cover different failure modes inside the same window.

Acceleration

Same-lever twins

2

Both use the same lever in the same direction. Usually redundant inside a portfolio: each dollar or effort unit only buys one lever pull, even if two strategies are named.

AI containmenttwinSafe by construction AItwin

Axis position

What the strategy acts onAI artefact
Coercion levelConsent
Actor in controlHumans as principals
Time horizonDuring transition
Legitimacy sourceTechnical

Source note: AI for safety strategy.md