person
Alex Pan
Berkeley CHAI; reward hacking
PhD student in computer science at UC Berkeley's Center for Human-Compatible AI under Stuart Russell. Focuses on reward hacking and emergent misalignment in RL.
current PhD Researcher, UC Berkeley CHAI
Strategy positions
Alignment firstendorses
Solve technical alignment before capability thresholds closeArgues reward hacking, models exploiting flaws in their training objective, is a tractable empirical problem that demands more attention from the alignment community.
Reward hacking shows up reliably across a range of agents and tasks. The good news is that it is empirically studyable; the bad news is that it does not have a known general solution.
Closest strategy neighbours
by jaccard overlapOther people whose strategy tags overlap with Alex Pan's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.
Record last updated 2026-04-25.