Control mechanism ↑ · ai artefact

AI for safety

The same capability that makes AI dangerous makes it uniquely useful for automating alignment research and oversight.

Mechanism

Use AI for automated alignment research, interpretability, evaluation, and scalable oversight of stronger AI by weaker AI.

If it succeeds: what binds next

AI runs alignment research. The question becomes whether the safety-AI itself is aligned, recursing one level.

A strategy that produces a worse next problem than the one it solved has not done durable work.

Coordinates

Primary leverControl mechanism (Add mechanism)

Acts onai artefact

Coercionconsent

Actor in controlhumans

Time horizonduring transition

Legitimacy sourcetechnical

Conflicts, grouped by mechanism

Frame opposition

incompatible premises

The strategies accept different premises about what AI is or what the binding problem is. They conflict not on lever choice but on the frame that makes lever choice sensible.

Decouple reasoning from actionAbandon superintelligence

Complements, grouped by mechanism

Same-lever reinforce

same lever, same pull, different mechanism

Both strategies pull the same lever in the same direction by different means. They stack: doing both amplifies the pull, at the cost of double-counting in portfolio audits.

Alignment firstInterpretability firstCounter AI AI

Same phase, different layer

same stage, distinct levers

Both are active in the same phase of the transition but act on different layers (model vs institution vs culture). They cover different failure modes inside the same window.

Acceleration

Same-lever twins

Both use the same lever in the same direction. Usually redundant inside a portfolio: each dollar or effort unit only buys one lever pull, even if two strategies are named.

AI containmenttwinSafe by construction AItwin

Axis position

What the strategy acts onAI artefact

Coercion levelConsent

Actor in controlHumans as principals

Time horizonDuring transition

Legitimacy sourceTechnical

Source note: AI for safety strategy.md