AGI Strategies

person

Paul Christiano

Founder of the US AI Safety Institute safety team; ex-OpenAI alignment lead

Key architect of RLHF and of much of modern alignment theory. Founded the Alignment Research Center; now runs safety at the US AI Safety Institute inside NIST. Publicly estimates ~46% chance of doom.

current Head of AI Safety, US AI Safety Institute (NIST); Founder, Alignment Research Center
past Alignment team lead, OpenAI

Profile

expertise

Frontier builder

Currently or recently led training, architecture, or safety work on a frontier model. Hands on the loss curve.

Invented RLHF (the technique behind ChatGPT instruct-tuning). Founded Alignment Research Center. Now heads safety at the US AI Safety Institute (NIST).

recognition

Field-leading

Widely known inside the AI and AI-safety community. Appears repeatedly in top venues, podcasts, or governance forums. Not a household name to outsiders.

Universal name in AI safety research. NIST appointment got policy-press coverage. Not a household name.

vintage

Scaling era

Worldview formed during GPT-2/3, scaling laws, Anthropic's founding. Pre-ChatGPT but post-deep-learning. The 'scale is all you need' debate is live.

Joined OpenAI 2017; introduced RLHF 2017. ARC 2021. His career is scaling-era applied alignment.

Hand-classified. See the board for the criteria and the full grid.

p(doom)

  • 46%2023-04-27

    Definition used: Approximately 46% chance of an extremely bad outcome, in his LessWrong post decomposing takeover and non-takeover catastrophes.

    My views on doom · LessWrong

Strategy positions

Alignment firstendorses

Solve technical alignment before capability thresholds close

Canonical modern alignment researcher; works on debate, RLHF, and eliciting latent knowledge.

I'd guess something like a 20% chance of an AI takeover, with many of the humans dead, and a further 30% chance or so of serious irreversible problems short of takeover.

Context: From his LessWrong post 'My views on doom'.

blogMy views on doom· LessWrong· 2023-04-27· faithful paraphrase

Closest strategy neighbours

by jaccard overlap

Other people whose strategy tags overlap with Paul Christiano's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.

  • Aaron Courville

    shared 1 · J=1.00

    Université de Montréal; Deep Learning textbook co-author

  • Adam Jermyn

    shared 1 · J=1.00

    Anthropic; previously astrophysics

  • Adam Kalai

    shared 1 · J=1.00

    Microsoft Research; AI fairness and safety

  • Agnes Callard

    Agnes Callard

    shared 1 · J=1.00

    University of Chicago philosopher; aspiration theorist

  • Ajeya Cotra

    shared 1 · J=1.00

    Open Philanthropy researcher; 'biological anchors' forecaster

  • Alan Turing

    Alan Turing

    shared 1 · J=1.00

    Founder of theoretical computer science (1912–1954)

Record last updated 2026-04-24.