person
Tomek Korbak
UK AI Security Institute; ex-Anthropic; pretraining alignment
Researcher at the UK AI Security Institute (AISI); previously at Anthropic. Doctoral work on pretraining-time alignment via behaviour-cloning and conditional control.
current Research Scientist, UK AI Security Institute
past Member of Technical Staff, Anthropic
Strategy positions
Alignment firstendorses
Solve technical alignment before capability thresholds closeArgues alignment ought to start at pretraining, not just RLHF, because behaviour shaped by base-model training is far harder to undo later.
“We propose pretraining language models with human preferences. The resulting models follow human preferences much more closely than ones aligned only at fine-tuning time.”
Closest strategy neighbours
by jaccard overlapOther people whose strategy tags overlap with Tomek Korbak's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.
Record last updated 2026-04-25.