AGI Strategies

person

Alex Pan

Berkeley CHAI; reward hacking

PhD student in computer science at UC Berkeley's Center for Human-Compatible AI under Stuart Russell. Focuses on reward hacking and emergent misalignment in RL.

current PhD Researcher, UC Berkeley CHAI

Strategy positions

Alignment firstendorses

Solve technical alignment before capability thresholds close

Argues reward hacking, models exploiting flaws in their training objective, is a tractable empirical problem that demands more attention from the alignment community.

Reward hacking shows up reliably across a range of agents and tasks. The good news is that it is empirically studyable; the bad news is that it does not have a known general solution.
§ paperThe Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models· arXiv / ICLR· 2022· faithful paraphrase

Closest strategy neighbours

by jaccard overlap

Other people whose strategy tags overlap with Alex Pan's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.

  • Aaron Courville

    shared 1 · J=1.00

    Université de Montréal; Deep Learning textbook co-author

  • Adam Jermyn

    shared 1 · J=1.00

    Anthropic; previously astrophysics

  • Adam Kalai

    shared 1 · J=1.00

    Microsoft Research; AI fairness and safety

  • Agnes Callard

    Agnes Callard

    shared 1 · J=1.00

    University of Chicago philosopher; aspiration theorist

  • Ajeya Cotra

    shared 1 · J=1.00

    Open Philanthropy researcher; 'biological anchors' forecaster

  • Alan Turing

    Alan Turing

    shared 1 · J=1.00

    Founder of theoretical computer science (1912–1954)

Record last updated 2026-04-25.