person

Carroll Wainwright

Anthropic; ex-OpenAI; alignment researcher

Anthropic researcher on alignment; previously at OpenAI. Co-author of multiple foundational papers on RLHF and on summarization with human preferences.

current Member of Technical Staff, Anthropic

past Researcher, OpenAI

@lcwjr

Strategy positions

Alignment firstendorses

Solve technical alignment before capability thresholds close

Co-developed early RLHF methods that became the foundation of post-training across the industry; argues these techniques transfer responsibility for behaviour onto whoever sets up the human feedback.

Our results show that for English summarization, RLHF-trained models can outperform much larger fine-tuned models. The technique is powerful and inherits all the strengths and biases of the human raters.

§ paperLearning to Summarize with Human Feedback· arXiv / OpenAI· 2020· faithful paraphrase

Closest strategy neighbours

by jaccard overlap

Other people whose strategy tags overlap with Carroll Wainwright's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.

Aaron Courville
shared 1 · J=1.00
Université de Montréal; Deep Learning textbook co-author
Adam Jermyn
shared 1 · J=1.00
Anthropic; previously astrophysics
Adam Kalai
shared 1 · J=1.00
Microsoft Research; AI fairness and safety
Agnes Callard
shared 1 · J=1.00
University of Chicago philosopher; aspiration theorist
Ajeya Cotra
shared 1 · J=1.00
Open Philanthropy researcher; 'biological anchors' forecaster
Alan Turing
shared 1 · J=1.00
Founder of theoretical computer science (1912–1954)

Record last updated 2026-04-25.