person

Carroll Wainwright
Anthropic; ex-OpenAI; alignment researcher
Anthropic researcher on alignment; previously at OpenAI. Co-author of multiple foundational papers on RLHF and on summarization with human preferences.
current Member of Technical Staff, Anthropic
past Researcher, OpenAI
Strategy positions
Alignment firstendorses
Solve technical alignment before capability thresholds closeCo-developed early RLHF methods that became the foundation of post-training across the industry; argues these techniques transfer responsibility for behaviour onto whoever sets up the human feedback.
Our results show that for English summarization, RLHF-trained models can outperform much larger fine-tuned models. The technique is powerful and inherits all the strengths and biases of the human raters.
Closest strategy neighbours
by jaccard overlapOther people whose strategy tags overlap with Carroll Wainwright's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.
Record last updated 2026-04-25.