Institutional capacity • · institutional

Voluntary restraint

Labs know more about what safety requires than regulators, and self-binding commitments capture that expertise without legislative lag.

Mechanism

Frontier labs commit unilaterally or collectively to safety practices (RSPs, frontier safety frameworks, red-teaming pacts) without external enforcement.

If it succeeds: what binds next

Labs voluntarily restrain. Restraint holds until the first defector; the binding problem is the credibility of the commitment to defect against defectors.

A strategy that produces a worse next problem than the one it solved has not done durable work.

Falsification signal

Visible weakening of RSP text under capability pressure, combined with no meaningful penalty.

A strategy held without a falsification signal is not strategy; it is affiliation. Continued support after this signal lands is identity, not bet. See the identity diagnostic.

Self-undermining threshold

overshoot risk

When capability pressure reaches the level where a single lab defection flips the equilibrium.

Commitments that hold under mild capability pressure weaken visibly under strong pressure. Repeated RSP revisions are the observable pattern.

Every strategy has a stable region where it reinforces itself and an unstable region where pursuit defeats it. The threshold between them is usually narrower than advocates acknowledge.

Addresses 1 failure scenario

all scenarios →

Evaluation abandonment under pressure

Coordinates

Primary leverInstitutional capacity (neutral)

Acts oninstitutional

Coercionconsent

Actor in controlhumans

Time horizonpre transition

Legitimacy sourceself

Conflicts, grouped by mechanism

Lever opposition

same lever, opposite pull

The pair's primary lever is the same; they pull it in opposite directions. A portfolio containing both is internally incoherent on that lever.

Governance first

Complements, grouped by mechanism

Cross-side bridge

one AI-side, one world-side

One acts on the model, the other on institutions or culture. The bridge hedges against both artefact-level and substrate-level failure.

Alignment firstInterpretability first

Same phase, different layer

same stage, distinct levers

Both are active in the same phase of the transition but act on different layers (model vs institution vs culture). They cover different failure modes inside the same window.

Scientific accumulationInternational AI agency

Axis position

What the strategy acts onInstitutional

Coercion levelConsent

Actor in controlHumans as principals

Time horizonPre-transition

Legitimacy sourceSelf

Source note: Voluntary restraint strategy.md