strategy tag

Constitutional AI.

Principles-based training for value alignment

Rich strategy page →Compare to another →

People on the record.

Ben Mann

Anthropic co-founder; researcher

endorses

Advocates principles-based training as a scalable alignment lever; co-architected Constitutional AI.

Constitutional AI is a way to get alignment without requiring human labellers to review every problematic output.

§ paperConstitutional AI: Harmlessness from AI Feedback· arXiv· 2022-12-15· faithful paraphrase

Yuntao Bai

Anthropic; Constitutional AI co-author

endorses

Argues principles-based training, where models are trained against an explicit constitution by another AI, scales human oversight in a way RLHF alone does not.

“We propose Constitutional AI: a method for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs.”

§ paperConstitutional AI: Harmlessness from AI Feedback· arXiv / Anthropic· 2022-12· direct quote

People on the record.

Ben MannBMbenBen MannAnthropic co-founder; researcherConstitutional AI

Yuntao BaiYByunYuntao BaiAnthropic; Constitutional AI co-authorConstitutional AI

Ben Mann

Yuntao Bai