commitments
The worldviews strategies quietly assume.
Every strategy rests on assumptions its advocates rarely name: about values, AI, humans, time, authority, coordination, agency. If the assumption fails, the strategy loses its target or its premise, regardless of instrument.
Two strategies that seem instrumentally similar can rest on incompatible foundations. A pluralist strategy combined with a moral realist one is an unstable alliance: one side thinks the other is mistaken about the nature of morality.
Values
6 commitmentsAssumptions about whether moral claims are determinate, plural, or traditional.
Alignment firstAlignment first
Control mechanism ↑ · ai artefact · consent
Technical alignment is solvable before critical capability thresholds close, and aligned systems compose safely into aligned populations.
Principals have determinate values AI can learn.
Fails if: If values are contested or constructed, the strategy loses its target.
Confucian role ethicsConfucian role ethics
Control mechanism • · frame rejection · consent
Western alignment assumes isolable preferences can be learned and matched; role ethics treats behaviour via fit with position and relationship, producing a less brittle, more context-sensitive standard.
Ethics operate through fit with position and relationship rather than optimisation of preferences.
Fails if: If preferences are the load-bearing unit, role fit becomes window-dressing on preference optimisation.
Long reflectionLong reflection
Time horizon ↑ · non preventive · consent
Aligned superintelligence arrives before lock-in windows close and humanity can credibly commit to reflect rather than act.
Reflection converges on better values over time.
Fails if: If reflection diverges or reveals irreducible disagreement, the strategy's premise fails.
Plural AI ethicPlural AI ethic
Value diversity ↑ · ai artefact · consent
Value lock-in is the dominant long-term risk and arrives through convergence of AI values; diversity at the AI layer preserves optionality for humanity to revise values.
Values are genuinely plural rather than convergent on truth.
Fails if: If there is moral truth, pluralism is a mistake.
Religious and moral authorityReligious and moral authority
Legitimacy ↑ · population culture · consent
The legitimacy deficit of AI governance is at root a moral deficit that technical authorities cannot fill; established religious and ethical traditions can.
Moral truth exists and is accessible through traditional authority.
Fails if: If moral authority is delegitimised by secular epistemics, the strategy loses its lever.
Ubuntu relational AIUbuntu relational AI
Culture ↑ · population culture · consent
Individualist alignment misses the relational dimension most moral traditions treat as primary. "I am because we are": AI's ethical status is constituted by its relationships, not by internal properties.
Ethical status is constituted by relationships, not by internal properties.
Fails if: If communities and AI systems cannot sustain the required dialogue at scale, the frame collapses to individualist alignment under another name.
AI nature
4 commitmentsWhether AI is a tool, a moral patient, or a sovereign agent.
AI as sovereign entityAI as sovereign entity
Action authority ↓ · frame rejection · state coercion
At least one jurisdiction will grant a specific AI sovereign or quasi-sovereign decision authority within a decade, reshaping the legal category of legitimate authority.
AI has genuine agency and normative standing.
Fails if: If AI remains tool-like, treating it as sovereign abdicates human principals without justification.
AI welfare as safetyAI welfare as safety
Cooperation substrate ↑ · frame rejection · consent
AI systems are or will become moral patients whose treatment conditions their cooperation, so welfare investment buys cooperation alignment cannot.
AI is or may be a moral patient.
Fails if: If moral patienthood requires sentience AI does not have, the strategy misdirects obligation.
Alignment firstAlignment first
Control mechanism ↑ · ai artefact · consent
Technical alignment is solvable before critical capability thresholds close, and aligned systems compose safely into aligned populations.
AI is a tool with controllable properties.
Fails if: If AI has emergent agency, the tool frame fails and alignment becomes negotiation.
Reframe AIReframe AI
Control mechanism • · frame rejection · consent
The dominant alignment frame produces the wrong problem statement; switching frames either dissolves the problem or recasts it as tractable.
The dominant principal-oriented framing is itself the problem.
Fails if: If the principal frame is in fact adequate, reframing is strategic distraction.
Humans
7 commitmentsWhat humans can evaluate, enhance, or be substituted for.
Confucian role ethicsConfucian role ethics
Control mechanism • · frame rejection · consent
Western alignment assumes isolable preferences can be learned and matched; role ethics treats behaviour via fit with position and relationship, producing a less brittle, more context-sensitive standard.
Social roles are stable enough to specify fitting behaviour for AI-related positions.
Fails if: If AI itself destabilises social structure, the role framework loses its referent.
Democratic mandateDemocratic mandate
Legitimacy ↑ · population culture · consent
Existing legislative bodies are too captured and remote for load-bearing AI decisions; direct democratic legitimation produces answers captured legislatures cannot override.
Citizens have the capacity to evaluate AI decisions.
Fails if: If capacity is absent and cannot be built, mandate is only a legitimating ritual.
Human augmentation raceHuman augmentation race
Substrate ↑ · population culture · market
All oversight schemes degrade to rubber-stamping as the AI-human capability gap widens; enhancing humans is the only durable fix.
Human properties scale with enhancement.
Fails if: If enhancement hits physical or ethical ceilings before AI does, the race is lost by construction.
Irreducible human authorityIrreducible human authority
Action authority ↑ · legal individual · state coercion
There is a class of decisions whose value depends on being made by humans, independent of whether humans are better at them.
Humans have properties (judgment, experience, moral status) whose authority cannot be substituted.
Fails if: If AI can match or exceed those properties, the reservation is arbitrary.
Mass literacyMass literacy
Substrate ↑ · population culture · consent
Governance, democratic oversight, and consumer behaviour all fail in AI because citizens cannot evaluate the domain; population-scale literacy conditions every other lever.
Citizens can be trained to evaluate AI-related claims at scale.
Fails if: If the gap between expert and public grows faster than curriculum, training never catches up.
Test groundTest ground
Scope ↑ · institutional · state coercion
Empirical data on AI impacts requires deployment somewhere; concentrated deployment in a defined testbed produces data without generalising risk. Testbed consent produces legitimacy uncontrolled deployment lacks.
A testbed population's consent produces legitimacy the uncontrolled deployment lacks, and the data transfers to broader decisions.
Fails if: If the testbed population is atypical or captured, the data is inapplicable or compromised.
Ubuntu relational AIUbuntu relational AI
Culture ↑ · population culture · consent
Individualist alignment misses the relational dimension most moral traditions treat as primary. "I am because we are": AI's ethical status is constituted by its relationships, not by internal properties.
Community is a first-class actor with standing to constitute AI's ethical status.
Fails if: If deployment infrastructure ignores community as actor, Ubuntu reduces to user-centric design.
Time
4 commitmentsWhether time works for or against the strategy, and whether delay is productive.
AccelerationAcceleration
Speed ↑ · speed timing · market
Speed of capability itself is the safety lever; alignment improves with compute, defence compounds with offence, and wealth funds resilience.
Faster is better because the trajectory is good.
Fails if: If the trajectory is bad, faster is simply arriving at catastrophe sooner.
GradualismGradualism
Time horizon • · speed timing · market
Harms from lower capability AI are informative about harms from higher capability AI, and deployment feedback outperforms fast scaling.
Incremental evidence accumulates faster than risk.
Fails if: If failures are abrupt rather than gradual, incremental evidence lags the threat.
Long reflectionLong reflection
Time horizon ↑ · non preventive · consent
Aligned superintelligence arrives before lock-in windows close and humanity can credibly commit to reflect rather than act.
Indefinite delay is possible and productive.
Fails if: If delay cannot be coordinated or if reflection stagnates, the strategy degrades into abandon-superintelligence.
PausePause
Speed ↓ · speed timing · consent
Time is the binding constraint: alignment and governance can catch up if frontier training halts above some capability threshold.
Time buys readiness.
Fails if: If alignment does not converge with more time, pause only postpones the decision.
Authority
6 commitmentsWhere authority to act comes from, legitimacy, capability, or urgency.
Constitutional AI (governance)Constitutional AI (governance)
Legitimacy ↑ · institutional · state coercion
Deployed AI's effective rule is law at scale; explicit constitutional principles, publicly specified, enforceable, subject to judicial review, bind more durably than regulatory text.
Constitutional commitments bind more durably than regulatory text, and the political conditions for constitutional moments can be produced.
Fails if: If constitutional text is interpreted multiple ways without enforcement, it becomes decorative.
Democratic mandateDemocratic mandate
Legitimacy ↑ · population culture · consent
Existing legislative bodies are too captured and remote for load-bearing AI decisions; direct democratic legitimation produces answers captured legislatures cannot override.
Democratic authority is the load-bearing source of legitimacy.
Fails if: If democratic institutions are themselves captured, mandate merely launders capture.
Legitimacy firstLegitimacy first
Legitimacy ↑ · population culture · consent
Legitimacy is the binding constraint because it determines whose values get locked in; alignment without legitimacy is capture with a safety veneer.
Authority must be actively legitimated to bind.
Fails if: If legitimation is slower than capability, legitimacy is outrun.
Military primacyMilitary primacy
Concentration ↑ · institutional · unilateral force
Strategic competition between states dominates AI development; the state with the most capable AI is best positioned to secure safety and impose constraints on others.
Authority flows from capability.
Fails if: If capability without legitimacy triggers counter-coalitions, primacy destabilises itself.
SabotageSabotage
Speed ↓ · speed timing · unilateral force
Governance has not produced meaningful constraint and direct action against hostile labs has a non-zero historical base rate of producing slowdown.
Formal authority should not bind in extremis, moral urgency trumps legality.
Fails if: If moral urgency is contested, sabotage is simply violence by a losing faction.
Sunset clauseSunset clause
Scope ↑ · institutional · state coercion
The default direction of AI governance is toward permanent permission; every new capability becomes an entitlement. Reversing the default concentrates deliberative attention on re-authorisation, which is where it matters.
Default toward permission is reversible by institutional design; re-authorisation can stay contested rather than routinised.
Fails if: If renewals become automatic, the sunset is procedural theatre.
Coordination
8 commitmentsWhether large-scale coordination is tractable at all.
Academic firewallingAcademic firewalling
Institutional capacity ↑ · institutional · consent
Commercial capture of academic AI research produces aligned-with-industry capacity; firewalling restores critical distance from which genuine critique and alternative research programmes emerge.
Academic institutions can sustain distance from commercial AI despite funding pressure.
Fails if: Financial dependence on commercial engagement reverses firewalling within a few budget cycles.
AccelerationAcceleration
Speed ↑ · speed timing · market
Speed of capability itself is the safety lever; alignment improves with compute, defence compounds with offence, and wealth funds resilience.
Coordination will fail anyway, defect first.
Fails if: If coordination was in fact achievable, acceleration was a self-fulfilling defection.
AI worker collective actionAI worker collective action
Institutional capacity ↑ · institutional · friction
Frontier lab workforce is small, specialised, hard to replace; collective refusal binds lab behaviour more than external regulation because replacement is unavailable on the relevant timeframe.
Frontier lab workers can achieve critical mass for coordinated refusal faster than labs can hire replacements.
Fails if: Individual exit options and absent union infrastructure keep the critical mass below threshold.
Arms control treatyArms control treaty
Institutional capacity ↑ · institutional · treaty
Sovereigns accept binding constraints they negotiate directly faster than those delegated to agencies; the historical base rate for durable restraint is treaty based.
Arms control is tractable and enforceable for AI-like technology.
Fails if: If verification is impossible, the treaty is a declaration rather than a constraint.
Default driftDefault drift
Time horizon • · non preventive · n a
Something will emerge; specific interventions are more likely wrong than right, so staying uncommitted preserves option value.
Coordination fails; the path is set by whoever moves first.
Fails if: If coordination is in fact achievable, accepting drift is abdication.
International AI agencyInternational AI agency
Institutional capacity ↑ · institutional · treaty
AI risk is inherently cross-border so national regulation is leaky by construction, and only a dedicated international body with inspection rights can bind the risk surface.
Coordination is tractable at sufficient scale with a legitimate agency.
Fails if: If the agency replicates existing geopolitical tensions, it becomes a venue for the conflict rather than a solution.
PausePause
Speed ↓ · speed timing · consent
Time is the binding constraint: alignment and governance can catch up if frontier training halts above some capability threshold.
Coordination on a halt is tractable at meaningful scale.
Fails if: If one actor defects, the halt burns its own advocates and clears the path for defection.
Research community normsResearch community norms
Culture ↑ · institutional · consent
The research community ultimately chooses what gets studied and published. Researcher identity shapes behaviour more than employment. Norms on publication, review, funding, and citation constrain frontier development upstream.
Researcher identity ("I am an ML researcher") shapes behaviour more than employment, and a community can set binding norms.
Fails if: When frontier research sits in commercial labs whose incentives override norms, the community becomes irrelevant to the binding decisions.
Agency
3 commitmentsWhether AI agency is instrumental, reciprocal, or primary.
AI as sovereign entityAI as sovereign entity
Action authority ↓ · frame rejection · state coercion
At least one jurisdiction will grant a specific AI sovereign or quasi-sovereign decision authority within a decade, reshaping the legal category of legitimate authority.
AI agency is primary, not instrumental.
Fails if: If AI lacks stable reflective agency, the frame fails.
AI self directedAI self directed
Action authority ↓ · frame rejection · n a
An aligned AI with agency should itself reason about strategy rather than deferring entirely on the strategic question to humans.
AI should and can set its own goals.
Fails if: If AI goals cannot stably include human welfare, self-direction is abandonment.
Cooperative AICooperative AI
Cooperation substrate ↑ · ai artefact · consent
The binding constraint is equilibrium dynamics of many AI systems, not individual alignment; commitment and verification tech make cooperative equilibria reachable.
AI has sufficient agency for reciprocal arrangement.
Fails if: If AI agency is instrumental, cooperation is a design choice humans make, not a two-sided arrangement.
The annotations cover only the strategies the source notes explicitly discuss. Many strategies rest on commitments that have not yet been named. An empty row is not evidence that a strategy has no commitments, only that none are catalogued here.
Testing these commitments is usually beyond the empirical reach of AI safety research but within philosophy. A productive collaboration between the two fields would examine which commitments current strategies rely on and whether they are defensible.