AGI Strategies

person

Tomek Korbak

UK AI Security Institute; ex-Anthropic; pretraining alignment

Researcher at the UK AI Security Institute (AISI); previously at Anthropic. Doctoral work on pretraining-time alignment via behaviour-cloning and conditional control.

current Research Scientist, UK AI Security Institute
past Member of Technical Staff, Anthropic

Strategy positions

Alignment firstendorses

Solve technical alignment before capability thresholds close

Argues alignment ought to start at pretraining, not just RLHF, because behaviour shaped by base-model training is far harder to undo later.

“We propose pretraining language models with human preferences. The resulting models follow human preferences much more closely than ones aligned only at fine-tuning time.”
§ paperPretraining Language Models with Human Preferences· arXiv· 2023-02· direct quote

Closest strategy neighbours

by jaccard overlap

Other people whose strategy tags overlap with Tomek Korbak's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.

  • Aaron Courville

    shared 1 · J=1.00

    Université de Montréal; Deep Learning textbook co-author

  • Adam Jermyn

    shared 1 · J=1.00

    Anthropic; previously astrophysics

  • Adam Kalai

    shared 1 · J=1.00

    Microsoft Research; AI fairness and safety

  • Agnes Callard

    Agnes Callard

    shared 1 · J=1.00

    University of Chicago philosopher; aspiration theorist

  • Ajeya Cotra

    shared 1 · J=1.00

    Open Philanthropy researcher; 'biological anchors' forecaster

  • Alan Turing

    Alan Turing

    shared 1 · J=1.00

    Founder of theoretical computer science (1912–1954)

Record last updated 2026-04-25.