AGI Strategies

strategy tag

Constitutional AI.

Principles-based training for value alignment

People on the record.

2

Ben Mann

Anthropic co-founder; researcher

endorses

Advocates principles-based training as a scalable alignment lever; co-architected Constitutional AI.

Constitutional AI is a way to get alignment without requiring human labellers to review every problematic output.
§ paperConstitutional AI: Harmlessness from AI Feedback· arXiv· 2022-12-15· faithful paraphrase

Yuntao Bai

Anthropic; Constitutional AI co-author

endorses

Argues principles-based training, where models are trained against an explicit constitution by another AI, scales human oversight in a way RLHF alone does not.

“We propose Constitutional AI: a method for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs.”
§ paperConstitutional AI: Harmlessness from AI Feedback· arXiv / Anthropic· 2022-12· direct quote