commitments

The worldviews strategies quietly assume.

Every strategy rests on assumptions its advocates rarely name: about values, AI, humans, time, authority, coordination, agency. If the assumption fails, the strategy loses its target or its premise, regardless of instrument.

Two strategies that seem instrumentally similar can rest on incompatible foundations. A pluralist strategy combined with a moral realist one is an unstable alliance: one side thinks the other is mistaken about the nature of morality.

Strategies annotated

of 76

Commitments named

across 7 topics

Most common topic

Coordination

where the field's hidden assumptions cluster

Values

6 commitments

Assumptions about whether moral claims are determinate, plural, or traditional.

Alignment first

Principals have determinate values AI can learn.

Fails if: If values are contested or constructed, the strategy loses its target.

Confucian role ethics

Ethics operate through fit with position and relationship rather than optimisation of preferences.

Fails if: If preferences are the load-bearing unit, role fit becomes window-dressing on preference optimisation.

Long reflection

Reflection converges on better values over time.

Fails if: If reflection diverges or reveals irreducible disagreement, the strategy's premise fails.

Plural AI ethic

Values are genuinely plural rather than convergent on truth.

Fails if: If there is moral truth, pluralism is a mistake.

Religious and moral authority

Moral truth exists and is accessible through traditional authority.

Fails if: If moral authority is delegitimised by secular epistemics, the strategy loses its lever.

Ubuntu relational AI

Ethical status is constituted by relationships, not by internal properties.

Fails if: If communities and AI systems cannot sustain the required dialogue at scale, the frame collapses to individualist alignment under another name.

AI nature

4 commitments

Whether AI is a tool, a moral patient, or a sovereign agent.

AI as sovereign entity

AI has genuine agency and normative standing.

Fails if: If AI remains tool-like, treating it as sovereign abdicates human principals without justification.

AI welfare as safety

AI is or may be a moral patient.

Fails if: If moral patienthood requires sentience AI does not have, the strategy misdirects obligation.

Alignment first

AI is a tool with controllable properties.

Fails if: If AI has emergent agency, the tool frame fails and alignment becomes negotiation.

Reframe AI

The dominant principal-oriented framing is itself the problem.

Fails if: If the principal frame is in fact adequate, reframing is strategic distraction.

Humans

7 commitments

What humans can evaluate, enhance, or be substituted for.

Confucian role ethics

Social roles are stable enough to specify fitting behaviour for AI-related positions.

Fails if: If AI itself destabilises social structure, the role framework loses its referent.

Democratic mandate

Citizens have the capacity to evaluate AI decisions.

Fails if: If capacity is absent and cannot be built, mandate is only a legitimating ritual.

Human augmentation race

Human properties scale with enhancement.

Fails if: If enhancement hits physical or ethical ceilings before AI does, the race is lost by construction.

Irreducible human authority

Humans have properties (judgment, experience, moral status) whose authority cannot be substituted.

Fails if: If AI can match or exceed those properties, the reservation is arbitrary.

Mass literacy

Citizens can be trained to evaluate AI-related claims at scale.

Fails if: If the gap between expert and public grows faster than curriculum, training never catches up.

Test ground

A testbed population's consent produces legitimacy the uncontrolled deployment lacks, and the data transfers to broader decisions.

Fails if: If the testbed population is atypical or captured, the data is inapplicable or compromised.

Ubuntu relational AI

Community is a first-class actor with standing to constitute AI's ethical status.

Fails if: If deployment infrastructure ignores community as actor, Ubuntu reduces to user-centric design.

Time

4 commitments

Whether time works for or against the strategy, and whether delay is productive.

Acceleration

Faster is better because the trajectory is good.

Fails if: If the trajectory is bad, faster is simply arriving at catastrophe sooner.

Gradualism

Incremental evidence accumulates faster than risk.

Fails if: If failures are abrupt rather than gradual, incremental evidence lags the threat.

Long reflection

Indefinite delay is possible and productive.

Fails if: If delay cannot be coordinated or if reflection stagnates, the strategy degrades into abandon-superintelligence.

Pause

Time buys readiness.

Fails if: If alignment does not converge with more time, pause only postpones the decision.

Authority

6 commitments

Where authority to act comes from, legitimacy, capability, or urgency.

Constitutional AI (governance)

Constitutional commitments bind more durably than regulatory text, and the political conditions for constitutional moments can be produced.

Fails if: If constitutional text is interpreted multiple ways without enforcement, it becomes decorative.

Democratic mandate

Democratic authority is the load-bearing source of legitimacy.

Fails if: If democratic institutions are themselves captured, mandate merely launders capture.

Legitimacy first

Authority must be actively legitimated to bind.

Fails if: If legitimation is slower than capability, legitimacy is outrun.

Military primacy

Authority flows from capability.

Fails if: If capability without legitimacy triggers counter-coalitions, primacy destabilises itself.

Sabotage

Formal authority should not bind in extremis, moral urgency trumps legality.

Fails if: If moral urgency is contested, sabotage is simply violence by a losing faction.

Sunset clause

Default toward permission is reversible by institutional design; re-authorisation can stay contested rather than routinised.

Fails if: If renewals become automatic, the sunset is procedural theatre.

Coordination

8 commitments

Whether large-scale coordination is tractable at all.

Academic firewalling

Academic institutions can sustain distance from commercial AI despite funding pressure.

Fails if: Financial dependence on commercial engagement reverses firewalling within a few budget cycles.

Acceleration

Coordination will fail anyway, defect first.

Fails if: If coordination was in fact achievable, acceleration was a self-fulfilling defection.

AI worker collective action

Frontier lab workers can achieve critical mass for coordinated refusal faster than labs can hire replacements.

Fails if: Individual exit options and absent union infrastructure keep the critical mass below threshold.

Arms control treaty

Arms control is tractable and enforceable for AI-like technology.

Fails if: If verification is impossible, the treaty is a declaration rather than a constraint.

Default drift

Coordination fails; the path is set by whoever moves first.

Fails if: If coordination is in fact achievable, accepting drift is abdication.

International AI agency

Coordination is tractable at sufficient scale with a legitimate agency.

Fails if: If the agency replicates existing geopolitical tensions, it becomes a venue for the conflict rather than a solution.

Pause

Coordination on a halt is tractable at meaningful scale.

Fails if: If one actor defects, the halt burns its own advocates and clears the path for defection.

Research community norms

Researcher identity ("I am an ML researcher") shapes behaviour more than employment, and a community can set binding norms.

Fails if: When frontier research sits in commercial labs whose incentives override norms, the community becomes irrelevant to the binding decisions.

Agency

3 commitments

Whether AI agency is instrumental, reciprocal, or primary.

AI as sovereign entity

AI agency is primary, not instrumental.

Fails if: If AI lacks stable reflective agency, the frame fails.

AI self directed

AI should and can set its own goals.

Fails if: If AI goals cannot stably include human welfare, self-direction is abandonment.

Cooperative AI

AI has sufficient agency for reciprocal arrangement.

Fails if: If AI agency is instrumental, cooperation is a design choice humans make, not a two-sided arrangement.

The annotations cover only the strategies the source notes explicitly discuss. Many strategies rest on commitments that have not yet been named. An empty row is not evidence that a strategy has no commitments, only that none are catalogued here.

Testing these commitments is usually beyond the empirical reach of AI safety research but within philosophy. A productive collaboration between the two fields would examine which commitments current strategies rely on and whether they are defensible.