strategy tag

Scalable oversight.

Human or human+AI oversight scales past human expertise

People on the record.

Ben Shneiderman

UMD emeritus; 'Human-Centered AI' framework

endorses

Argues 'human-centered AI' design, high control AND high automation, is achievable and dissolves the false dichotomy between intelligence and autonomy.

We can have high levels of human control AND high levels of automation. The two-dimensional HCAI framework rejects the false trade-off.

❧ bookHuman-Centered AI· Oxford University Press· 2022· faithful paraphrase

Pavel Izmailov

OpenAI; ex-superalignment team

endorses

Argues weak-to-strong generalization, using weaker, slower-to-improve models to supervise stronger ones, is the structural bet behind scalable alignment of superhuman models.

We study an analogous problem: how can weak teachers supervise much more capable students? This is a simplified empirical analogue of the alignment problem, and we find that strong students naively trained on weak supervision generalize beyond their teachers in important ways.

§ paperWeak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision· arXiv / OpenAI· 2023-12· faithful paraphrase

People on the record.

Ben ShneidermanBen ShneidermanUMD emeritus; 'Human-Centered AI' frameworkScalable oversight

Pavel IzmailovPIpavPavel IzmailovOpenAI; ex-superalignment teamScalable oversight

Ben Shneiderman

Pavel Izmailov