strategy tag
Scalable oversight.
Human or human+AI oversight scales past human expertise
People on the record.
2
Ben Shneiderman
UMD emeritus; 'Human-Centered AI' framework
Argues 'human-centered AI' design, high control AND high automation, is achievable and dissolves the false dichotomy between intelligence and autonomy.
We can have high levels of human control AND high levels of automation. The two-dimensional HCAI framework rejects the false trade-off.
Pavel Izmailov
OpenAI; ex-superalignment team
Argues weak-to-strong generalization, using weaker, slower-to-improve models to supervise stronger ones, is the structural bet behind scalable alignment of superhuman models.
We study an analogous problem: how can weak teachers supervise much more capable students? This is a simplified empirical analogue of the alignment problem, and we find that strong students naively trained on weak supervision generalize beyond their teachers in important ways.