person

Paul Christiano

Founder of the US AI Safety Institute safety team; ex-OpenAI alignment lead

Key architect of RLHF and of much of modern alignment theory. Founded the Alignment Research Center; now runs safety at the US AI Safety Institute inside NIST. Publicly estimates ~46% chance of doom.

current Head of AI Safety, US AI Safety Institute (NIST); Founder, Alignment Research Center

past Alignment team lead, OpenAI

wikipedia

Profile

expertise

Frontier builder

Currently or recently led training, architecture, or safety work on a frontier model. Hands on the loss curve.

Invented RLHF (the technique behind ChatGPT instruct-tuning). Founded Alignment Research Center. Now heads safety at the US AI Safety Institute (NIST).

recognition

Field-leading

Widely known inside the AI and AI-safety community. Appears repeatedly in top venues, podcasts, or governance forums. Not a household name to outsiders.

Universal name in AI safety research. NIST appointment got policy-press coverage. Not a household name.

vintage

Scaling era

Worldview formed during GPT-2/3, scaling laws, Anthropic's founding. Pre-ChatGPT but post-deep-learning. The 'scale is all you need' debate is live.

Joined OpenAI 2017; introduced RLHF 2017. ARC 2021. His career is scaling-era applied alignment.

Hand-classified. See the board for the criteria and the full grid.

p(doom)

46%2023-04-27
Definition used: Approximately 46% chance of an extremely bad outcome, in his LessWrong post decomposing takeover and non-takeover catastrophes.
My views on doom · LessWrong

Strategy positions

Alignment firstendorses

Solve technical alignment before capability thresholds close

Canonical modern alignment researcher; works on debate, RLHF, and eliciting latent knowledge.

I'd guess something like a 20% chance of an AI takeover, with many of the humans dead, and a further 30% chance or so of serious irreversible problems short of takeover.

Context: From his LessWrong post 'My views on doom'.

✍ blogMy views on doom· LessWrong· 2023-04-27· faithful paraphrase

Closest strategy neighbours

by jaccard overlap

Other people whose strategy tags overlap with Paul Christiano's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.

Aaron Courville
shared 1 · J=1.00
Université de Montréal; Deep Learning textbook co-author
Adam Jermyn
shared 1 · J=1.00
Anthropic; previously astrophysics
Adam Kalai
shared 1 · J=1.00
Microsoft Research; AI fairness and safety
Agnes Callard
shared 1 · J=1.00
University of Chicago philosopher; aspiration theorist
Ajeya Cotra
shared 1 · J=1.00
Open Philanthropy researcher; 'biological anchors' forecaster
Alan Turing
shared 1 · J=1.00
Founder of theoretical computer science (1912–1954)

Record last updated 2026-04-24.