AGI Strategies

person

Carroll Wainwright

Carroll Wainwright

Anthropic; ex-OpenAI; alignment researcher

Anthropic researcher on alignment; previously at OpenAI. Co-author of multiple foundational papers on RLHF and on summarization with human preferences.

current Member of Technical Staff, Anthropic
past Researcher, OpenAI

Strategy positions

Alignment firstendorses

Solve technical alignment before capability thresholds close

Co-developed early RLHF methods that became the foundation of post-training across the industry; argues these techniques transfer responsibility for behaviour onto whoever sets up the human feedback.

Our results show that for English summarization, RLHF-trained models can outperform much larger fine-tuned models. The technique is powerful and inherits all the strengths and biases of the human raters.
§ paperLearning to Summarize with Human Feedback· arXiv / OpenAI· 2020· faithful paraphrase

Closest strategy neighbours

by jaccard overlap

Other people whose strategy tags overlap with Carroll Wainwright's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.

  • Aaron Courville

    shared 1 · J=1.00

    Université de Montréal; Deep Learning textbook co-author

  • Adam Jermyn

    shared 1 · J=1.00

    Anthropic; previously astrophysics

  • Adam Kalai

    shared 1 · J=1.00

    Microsoft Research; AI fairness and safety

  • Agnes Callard

    Agnes Callard

    shared 1 · J=1.00

    University of Chicago philosopher; aspiration theorist

  • Ajeya Cotra

    shared 1 · J=1.00

    Open Philanthropy researcher; 'biological anchors' forecaster

  • Alan Turing

    Alan Turing

    shared 1 · J=1.00

    Founder of theoretical computer science (1912–1954)

Record last updated 2026-04-25.