AGI Strategies
← overview

Institutional capacity · institutional

Voluntary restraint

Labs know more about what safety requires than regulators, and self-binding commitments capture that expertise without legislative lag.

Mechanism

Frontier labs commit unilaterally or collectively to safety practices (RSPs, frontier safety frameworks, red-teaming pacts) without external enforcement.

If it succeeds: what binds next

Labs voluntarily restrain. Restraint holds until the first defector; the binding problem is the credibility of the commitment to defect against defectors.

A strategy that produces a worse next problem than the one it solved has not done durable work.

Falsification signal

Visible weakening of RSP text under capability pressure, combined with no meaningful penalty.

A strategy held without a falsification signal is not strategy; it is affiliation. Continued support after this signal lands is identity, not bet. See the identity diagnostic.

Self-undermining threshold

overshoot risk

When capability pressure reaches the level where a single lab defection flips the equilibrium.

Commitments that hold under mild capability pressure weaken visibly under strong pressure. Repeated RSP revisions are the observable pattern.

Every strategy has a stable region where it reinforces itself and an unstable region where pursuit defeats it. The threshold between them is usually narrower than advocates acknowledge.

Addresses 1 failure scenario

all scenarios →

Coordinates

Acts oninstitutional
Coercionconsent
Actor in controlhumans
Time horizonpre transition
Legitimacy sourceself

Conflicts, grouped by mechanism

1

Lever opposition

same lever, opposite pull

The pair's primary lever is the same; they pull it in opposite directions. A portfolio containing both is internally incoherent on that lever.

Governance first

Complements, grouped by mechanism

4

Cross-side bridge

one AI-side, one world-side

One acts on the model, the other on institutions or culture. The bridge hedges against both artefact-level and substrate-level failure.

Alignment firstInterpretability first

Same phase, different layer

same stage, distinct levers

Both are active in the same phase of the transition but act on different layers (model vs institution vs culture). They cover different failure modes inside the same window.

Scientific accumulationInternational AI agency

Axis position

What the strategy acts onInstitutional
Coercion levelConsent
Actor in controlHumans as principals
Time horizonPre-transition
Legitimacy sourceSelf

Source note: Voluntary restraint strategy.md