strategy tag
Constitutional AI.
Principles-based training for value alignment
People on the record.
2Ben Mann
Anthropic co-founder; researcher
Advocates principles-based training as a scalable alignment lever; co-architected Constitutional AI.
Constitutional AI is a way to get alignment without requiring human labellers to review every problematic output.
Yuntao Bai
Anthropic; Constitutional AI co-author
Argues principles-based training, where models are trained against an explicit constitution by another AI, scales human oversight in a way RLHF alone does not.
“We propose Constitutional AI: a method for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs.”