Control mechanism ↑ · ai artefact

Counter AI AI

AI attacks happen at speeds humans cannot observe; defence must happen in AI, with guardian AI systems continuously evaluating adversary AI.

Mechanism

Build and deploy a population of guardian AIs to detect, monitor, fact-check, and respond to other AIs in real time.

Falsification signal

The best guardian system is fooled by a model one generation newer.

A strategy held without a falsification signal is not strategy; it is affiliation. Continued support after this signal lands is identity, not bet. See the identity diagnostic.

Addresses 3 failure scenarios

all scenarios →

Frontier model deceives operators Correlated agent incident Information ecosystem collapse

Coordinates

Primary leverControl mechanism (Add mechanism)

Acts onai artefact

Coercionconsent

Actor in controlmulti ai

Time horizonduring transition

Legitimacy sourcetechnical

Conflicts, grouped by mechanism

No strict conflicts catalogued. This strategy pulls a lever that nothing else pulls in the opposite direction.

Complements, grouped by mechanism

Same-lever reinforce

same lever, same pull, different mechanism

Both strategies pull the same lever in the same direction by different means. They stack: doing both amplifies the pull, at the cost of double-counting in portfolio audits.

AI for safetyInterpretability first

Stage-sequenced

one sets up the other

The pair is phase-offset: one acts before the transition, the other during or after. The first creates the conditions under which the second binds.

Information integrity firstDifferential technology development

Same phase, different layer

same stage, distinct levers

Both are active in the same phase of the transition but act on different layers (model vs institution vs culture). They cover different failure modes inside the same window.

Cooperative AI

Same-lever twins

Both use the same lever in the same direction. Usually redundant inside a portfolio: each dollar or effort unit only buys one lever pull, even if two strategies are named.

AI containmenttwinAlignment firsttwinSafe by construction AItwin

Axis position

What the strategy acts onAI artefact

Coercion levelConsent

Actor in controlMulti-AI equilibrium

Time horizonDuring transition

Legitimacy sourceTechnical

Source note: Counter AI AI strategy.md