Institutional capacity • · institutional
Voluntary restraint
Labs know more about what safety requires than regulators, and self-binding commitments capture that expertise without legislative lag.
Mechanism
Frontier labs commit unilaterally or collectively to safety practices (RSPs, frontier safety frameworks, red-teaming pacts) without external enforcement.
If it succeeds: what binds next
Labs voluntarily restrain. Restraint holds until the first defector; the binding problem is the credibility of the commitment to defect against defectors.
A strategy that produces a worse next problem than the one it solved has not done durable work.
Falsification signal
Visible weakening of RSP text under capability pressure, combined with no meaningful penalty.
A strategy held without a falsification signal is not strategy; it is affiliation. Continued support after this signal lands is identity, not bet. See the identity diagnostic.
Self-undermining threshold
overshoot riskWhen capability pressure reaches the level where a single lab defection flips the equilibrium.
Commitments that hold under mild capability pressure weaken visibly under strong pressure. Repeated RSP revisions are the observable pattern.
Every strategy has a stable region where it reinforces itself and an unstable region where pursuit defeats it. The threshold between them is usually narrower than advocates acknowledge.
Addresses 1 failure scenario
all scenarios →Coordinates
Conflicts, grouped by mechanism
1Lever opposition
same lever, opposite pullThe pair's primary lever is the same; they pull it in opposite directions. A portfolio containing both is internally incoherent on that lever.
Complements, grouped by mechanism
4Cross-side bridge
one AI-side, one world-sideOne acts on the model, the other on institutions or culture. The bridge hedges against both artefact-level and substrate-level failure.
Same phase, different layer
same stage, distinct leversBoth are active in the same phase of the transition but act on different layers (model vs institution vs culture). They cover different failure modes inside the same window.
Axis position
Source note: Voluntary restraint strategy.md