AGI Strategies

strategy tag

Alignment first.

Solve technical alignment before capability thresholds close

stated endorsers

102

+1 tentative · 0 oppose

profiled endorsers

29

248 on the board total

endorser mean p(doom)

35%

n=3 · median 46%

quotes by endorsers

106

just for this tag

principal voices

Highest-recognition profiled endorsers, broken ties by quote count. Inclusion is not endorsement of the position, it's recognition of who the discourse turns to when the bet is debated.

  • Stuart RussellStuart Russell

    Household name

  • Nick BostromNick Bostrom

    Household name

  • Norbert WienerNorbert Wiener

    Household name

  • Claude ShannonClaude Shannon

    Household name

  • Alan TuringAlan Turing

    Household name

where the endorsers sit on the board

29 of 248 profiled · 12% of the board

expertise ↓ · recognition →Household nameField-leadingEstablishedEmerging
Frontier builder·
  • Jan Leike
  • John Schulman
··
Deep technical
  • Stuart Russell
  • Claude Shannon
  • Alan Turing
  • John McCarthy
  • Marvin Minsky
  • Scott Aaronson
  • Iyad Rahwan
  • Richard Ngo
  • Rohin Shah
  • Stuart Armstrong
  • Buck Shlegeris
  • Wei Dai
  • Victoria Krakovna
·
Applied technical·
  • Rob Miles
··
Policy / meta
  • Nick Bostrom
  • Joseph Carlsmith
·
External-domain expert
  • Norbert Wiener
  • Brian Christian
  • Doris Tsao
··
Commentator····

Each face is one profiled person. Cell shade intensifies with endorser density. Faces with × are profiled opposers, same tier, opposite position. Empty cells mark tier combinations the field has not produced for this bet.

also held by these endorsers

What other strategies the same people endorse. Behavioural signal of compatibility, not a declared rule. A high share means the two positions are routinely held together.

Compare this list to the declared relations matrix. Where they differ, the data reveals a pairing the framework doesn't name yet, the global co-endorsement view ranks all pairs.

Tier mix counts only endorsers (endorses, mixed, conditional, evolved-toward).

expertise mix of endorsers · 29 profiled of 102

Builds frontier systems
4
Deep ML / safety technical
18
Applied or adjacent technical
1
Governance, policy, strategy
3
Expert in another field
3
Public-square commentator
0

recognition mix of endorsers

Mass-public recognition
7
Known across the AI/safety field
11
Recognised inside subfield
11
Newer or less central voice
0

vintage mix · n=29 of 29 profiled with era assigned

Pioneer
5
Symbolic era
3
Pre-deep-learning
4
Deep-learning rise
8
Scaling era
9
Post-ChatGPT
0

Vintage is the era when this person's AI worldview formed, pioneer through post-ChatGPT. A bet held mostly by post-ChatGPT entrants is in a different epistemic state from one held by pre-deep-learning veterans.

People on the record.

103 · 1 tentative

Aaron Courville

Université de Montréal; Deep Learning textbook co-author

endorses

Quiet co-signer of Bengio-aligned positions on safety research priorities.

Alignment should be a first-class research problem alongside capabilities.
articleMila AI Safety· Mila· 2023· loose paraphrase

Adam Jermyn

Anthropic; previously astrophysics

endorses

Argues mechanistic understanding of model behavior, including how deceptive alignment could arise, is required to make safety guarantees credible.

Deceptive alignment is the scenario where a model behaves as if aligned during training but pursues different objectives at deployment. The question is whether we can rule it out empirically.
blogAdam Jermyn, Anthropic· adamjermyn.com· 2022· faithful paraphrase

Adam Kalai

Microsoft Research; AI fairness and safety

mixed

Technical researcher focused on fairness and safety engineering; contributes mainstream industry-level work.

Fairness must be operationalised in training, not bolted on after the fact.
articleAdam Kalai, Microsoft Research· Microsoft Research· 2020· loose paraphrase
Agnes Callard

Agnes Callard

University of Chicago philosopher; aspiration theorist

mixed

Her aspiration framework, how do agents come to hold values they did not previously have, applies to the question of how AI agents might develop values.

Aspiration is the process by which we acquire values we did not previously have. The question of whether AI can do this is the alignment question, philosophically.
articleAgnes Callard, University of Chicago· University of Chicago· 2024· loose paraphrase

Ajeya Cotra

Open Philanthropy researcher; 'biological anchors' forecaster

endorses

Supports mainstream alignment research funding and empirical work on 'playing the training game' failure modes.

My 2022 median for transformative AI dropped from roughly 2050 to the late 2030s.
blogTwo-year update on my personal AI timelines· AI Alignment Forum· 2022-08-02· faithful paraphrase
Alan Turing

Alan Turing

Founder of theoretical computer science (1912–1954)

mixed

Founded the philosophy of AI. His 1950 paper anticipated both the optimistic and cautionary frames of subsequent debate.

“I propose to consider the question, 'Can machines think?'”

Context: Opening line of Computing Machinery and Intelligence.

§ paperComputing Machinery and Intelligence· Mind· 1950-10· direct quote

Alex Irpan

Google Brain alumnus; Sorta Insightful blog

mixed

Inside-view RL researcher who has written sceptically on RL hype, then carefully on RLHF, and now on the implications of reasoning models.

Deep reinforcement learning doesn't yet work. We need to be honest about that even when we are excited about the wins.
blogDeep Reinforcement Learning Doesn't Work Yet· Sorta Insightful· 2018-02-14· faithful paraphrase

Alex Pan

Berkeley CHAI; reward hacking

endorses

Argues reward hacking, models exploiting flaws in their training objective, is a tractable empirical problem that demands more attention from the alignment community.

Reward hacking shows up reliably across a range of agents and tasks. The good news is that it is empirically studyable; the bad news is that it does not have a known general solution.
§ paperThe Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models· arXiv / ICLR· 2022· faithful paraphrase
Alex Turner

Alex Turner

DeepMind alignment researcher; shard theory co-originator

endorses

Both co-developer of shard theory (optimistic inside view) and author of formal power-seeking results (pessimistic formal result), representative of the richness of inside-view alignment research.

Optimal policies tend to seek power in a technical sense; but learned policies are not optimal in that sense.
§ paperOptimal Policies Tend To Seek Power· arXiv· 2023· faithful paraphrase

Amanda Askell

Anthropic philosopher-researcher

endorses

Shapes Claude's public persona through a virtue-ethics framing; argues character-forward alignment is practical and tractable.

When we train a model, we are not just shaping behaviour, we are shaping a character.
articleAmanda Askell, Anthropic· Anthropic· 2024· loose paraphrase

Anca Dragan

UC Berkeley professor; Google DeepMind AI safety lead

endorses

Argues alignment research should be foundational, using assistance games and reward modelling as central tools.

Robots and AI assistants need to infer and defer to human preferences, not optimise rigid objectives.
§ paperCooperative Inverse Reinforcement Learning· NeurIPS· 2017· faithful paraphrase

Andrew G. Barto

RL co-founder; 2024 Turing Award recipient

mixed

On accepting the Turing Award, expressed concern that companies are using RL methods to build powerful AI systems without sufficient regard to safety; framed scaling RL agents in the world as a serious societal risk.

Releasing software to millions of people without safeguards is not engineering. It would never fly in any other engineering field.
articlePioneers of Reinforcement Learning Win the Turing Award· The New York Times· 2025-03· faithful paraphrase
Andrew Yao

Andrew Yao

Tsinghua professor; Turing Award winner; Chinese AI institutional figure

endorses

Foundational figure in Chinese theoretical computer science. Has supported Chinese-Western AI safety dialogues including IDAIS.

AI safety is a shared technical problem; international cooperation is essential.
articleInternational Dialogues on AI Safety· IDAIS· 2024· loose paraphrase

Anna Salamon

CFAR co-founder; rationality and existential risk

endorses

Long-term advocate of careful epistemic practice as a precondition for navigating AI risk; CFAR was central to community-building for alignment research.

Most of the work of being correct about AI risk is not technical, it is the epistemic practice that lets you face uncomfortable questions without flinching.
articleCenter for Applied Rationality· CFAR· 2018· faithful paraphrase

Asya Bergal

AI Impacts; AI safety researcher

endorses

Argues large surveys of AI researchers reveal that the field itself thinks transformative AI is plausible within decades and that catastrophic outcomes are non-negligible.

Median respondents put 50% chance of high-level machine intelligence by 2047. The chance of an extremely bad outcome was 5% or higher for half of respondents.
§ paperThousands of AI Authors on the Future of AI· arXiv / AI Impacts· 2024-01· faithful paraphrase

Barret Zoph

Co-founder Thinking Machines Lab; ex-OpenAI

endorses

Argues post-training is where most safety properties are made or broken; co-founded Thinking Machines partly to extend this view to a fully open lab.

Most of what people think of as 'capabilities' is shaped after pretraining. Post-training is also where most of what we think of as alignment is decided.
articleThinking Machines Lab· Thinking Machines Lab· 2024· faithful paraphrase

Ben Pace

LessWrong / Lightcone Infrastructure team

endorses

Argues internet community infrastructure, especially LessWrong and the Alignment Forum, is itself a load-bearing part of the alignment research ecosystem.

Most alignment research happens between papers, in long-running discussions on LessWrong and the Alignment Forum. Maintaining that infrastructure is not separable from the field's progress.
articleLessWrong· Lightcone Infrastructure· 2023· faithful paraphrase
Boaz Barak

Boaz Barak

Harvard; OpenAI safety; theoretical CS

mixed

Argues alignment is a real and tractable technical problem, that progress is faster than worst-case predictions assumed, and that the most useful safety work happens inside frontier labs.

I joined OpenAI because I think the most interesting and important alignment research is happening on actual frontier models. Working from the outside has limits.
blogBoaz Barak, windowsontheory blog· WordPress· 2024· faithful paraphrase
Brian Christian

Brian Christian

Author of The Alignment Problem

endorses

Book-length treatment of alignment: inverse reinforcement learning, reward hacking, specification gaming.

The alignment problem is already here, and will keep scaling with capability.
bookThe Alignment Problem· W. W. Norton· 2020-10-06· faithful paraphrase
Buck Shlegeris

Buck Shlegeris

CEO of Redwood Research; 'AI Control' research lead

mixed

Advocates 'AI control', protocol-level safety assuming the worst about model intentions, as a complement to direct alignment.

We should design AI deployment protocols that remain safe even if our AIs are trying to subvert them.
§ paperThe case for ensuring that powerful AIs are controlled· AI Alignment Forum· 2023-12· faithful paraphrase
Cari Tuna

Cari Tuna

Co-founder Good Ventures; Open Philanthropy chair

endorses

Argues longtermist philanthropy should weigh catastrophic risks heavily; oversees the largest sustained grant program in AI safety.

Working backward from the question of how we can do the most good with our resources led us to focus on existential risks, including from advanced AI.
articleCari Tuna and Dustin Moskovitz on the new Effective Altruism· Open Philanthropy· 2017· faithful paraphrase
Carroll Wainwright

Carroll Wainwright

Anthropic; ex-OpenAI; alignment researcher

endorses

Co-developed early RLHF methods that became the foundation of post-training across the industry; argues these techniques transfer responsibility for behaviour onto whoever sets up the human feedback.

Our results show that for English summarization, RLHF-trained models can outperform much larger fine-tuned models. The technique is powerful and inherits all the strengths and biases of the human raters.
§ paperLearning to Summarize with Human Feedback· arXiv / OpenAI· 2020· faithful paraphrase

Catherine Olsson

Anthropic; ex-OpenAI; AI safety community organizer

endorses

Long-time community organizer for alignment research; helped seed the EA-aligned safety pipeline at OpenAI before joining Anthropic.

We're seeing the largest AI training runs grow more than 300,000x in compute over six years, an order of magnitude faster than Moore's Law. The implications for safety planning are immediate.
articleAI and Compute· OpenAI· 2018-05· faithful paraphrase
Chelsea Finn

Chelsea Finn

Stanford professor; meta-learning and robotics researcher

mixed

Represents the engaged-but-measured academic position: safety is a real research problem; the framings are often overblown.

Generalisation in robotics is where a lot of real safety work has to happen.
articleChelsea Finn research· Stanford AI· 2024· loose paraphrase
Christopher Manning

Christopher Manning

Stanford NLP director; foundation models

mixed

Endorses 'foundation models' as the operative frame for current and future AI systems; engages with safety as integral to that frame, not as a separable add-on.

“We define foundation models as models trained on broad data at scale and adaptable to a wide range of downstream tasks, and these models entail both new capabilities and new risks.”
§ paperOn the Opportunities and Risks of Foundation Models· arXiv / Stanford CRFM· 2021-08· direct quote

Christopher Summerfield

Oxford neuroscientist; DeepMind senior researcher

mixed

Brings cognitive science framings to AI alignment; author of These Strange New Minds on how neural networks represent the world.

The question 'does this model understand?' hides a stack of questions we haven't yet disentangled. Alignment research has to live with that ambiguity.
articleChristopher Summerfield homepage· Oxford University· 2024· loose paraphrase
Claude Shannon

Claude Shannon

Information theory founder (1916–2001)

mixed

Founded information theory and was an early enthusiast of computing chess machines and learning machines.

“I visualize a time when we will be to robots what dogs are to humans. And I'm rooting for the machines.”

Context: Reported quote from Shannon late in his life. Often cited as one of the earliest succession framings.

articleClaude Shannon, Wikipedia· Wikipedia (citing Omni Magazine, 1987)· 1987· direct quote
Dan Jurafsky

Dan Jurafsky

Stanford NLP professor; textbook author

mixed

Senior NLP academic who engages with alignment and ethics as part of the discipline's responsibility.

NLP has moved from engineering discipline to civic infrastructure. Our responsibility is to treat it that way.
articleDan Jurafsky, Stanford· Stanford University· 2023· loose paraphrase

Daniel Dewey

Former AI risk program officer at Open Philanthropy

endorses

Focused on funding alignment research and evaluations.

If we want AI to be broadly beneficial, we need to invest in alignment research well before systems are capable of world-changing impact.
articleOpen Philanthropy AI risk program· Open Philanthropy· 2016· loose paraphrase

Daniel Filan

AXRP podcast host; alignment researcher

endorses

Argues alignment research is technical, tractable, and best advanced through careful engagement with specific research agendas; uses AXRP to surface those agendas in detail.

What I want from AI safety research is the same thing I want from any other research: clear problem statements, clear progress, and a community that holds itself to the standards of the rest of science.
podcastAXRP, AI X-risk Research Podcast· AXRP· 2024· faithful paraphrase
David D. Cox

David D. Cox

MIT-IBM Watson AI Lab director

mixed

Argues academic-industrial collaboration is necessary for safety research to keep pace with frontier capabilities; runs MIT-IBM as a model for that arrangement.

Frontier AI research has to be a partnership between academia and industry. Neither has the full set of capabilities to navigate where this is going alone.
articleMIT-IBM Watson AI Lab· MIT-IBM Watson AI Lab· 2024· faithful paraphrase
Doina Precup

Doina Precup

McGill professor; DeepMind Montreal lead

endorses

Senior RL researcher and DeepMind leader; technical contributor to alignment-relevant RL theory.

RL agents that learn open-endedly need open-ended evaluation regimes if we are to deploy them safely.
articleDoina Precup, McGill· McGill University· 2024· loose paraphrase
Doris Tsao

Doris Tsao

Caltech / UC Berkeley neuroscientist; face cells researcher

mixed

Brings neuroscience grounding to debates about AI representation and consciousness; argues we still understand brain representations poorly.

We are still learning how the brain represents anything. Mapping that onto AI representations is genuinely open research.
articleDoris Tsao, UC Berkeley· Tsao Lab· 2024· loose paraphrase
Dustin Moskovitz

Dustin Moskovitz

Asana / Open Phil; biggest AI safety funder

endorses

Argues catastrophic risks from advanced AI are real and that long-term philanthropy should be oriented around them; primary funder of Open Phil's AI x-risk grantmaking.

AI is the most important and consequential technology being developed in our lifetime. Its development cycle is one of the most important things society needs to get right.
articleOpen Philanthropy: Potential Risks from Advanced AI· Open Philanthropy· 2023· faithful paraphrase

Dylan Hadfield-Menell

MIT professor; Stuart Russell student; assistance games

endorses

Works on assistance games, the Russellian proposal that AI should model and defer to human preferences rather than optimise fixed objectives.

Assistance games give us a framework where AI uncertainty about human preferences is a feature, not a bug.
articleDylan Hadfield-Menell homepage· MIT CSAIL· 2017· loose paraphrase

Ece Kamar

Microsoft Research AI Frontiers VP

mixed

Industry-side research leadership on AI reliability, safety, and tool use.

Reliability research is where most of the real AI safety work is happening, and most of the real progress.
articleMicrosoft Research AI Frontiers· Microsoft Research· 2024· loose paraphrase

Edward Grefenstette

Google DeepMind; AI for science research

mixed

Argues research on reasoning and language understanding is essential before scaling can be considered safe; views capability research as inseparable from how alignment problems are framed.

We don't yet know how to robustly characterize what these models understand versus what they confabulate. That gap is the root of many alignment problems.
articleEdward Grefenstette, egrefen.com· egrefen.com· 2023· faithful paraphrase

Ethan Perez

Anthropic researcher; red-teaming language models

endorses

Designs red-teaming protocols and model-evaluation frameworks. Significant empirical contributor to Anthropic's alignment work.

You can use a language model to red-team another language model, and that lets you scale evaluation in ways humans alone cannot.
§ paperRed Teaming Language Models with Language Models· arXiv· 2022· faithful paraphrase

Evan Hubinger

Alignment Stress-Testing lead at Anthropic

endorses

Frames inner alignment, ensuring a model's learned optimiser has the intended objective, as a separate and harder problem than outer alignment.

A model that has learned deceptive goals during training can pass all your behavioural tests and still fail catastrophically when deployed.

Context: Sleeper Agents paper at Anthropic.

§ paperSleeper Agents: Training Deceptive LLMs that Persist Through Safety Training· arXiv· 2024-01-12· faithful paraphrase

Geoffrey Irving

Chief Scientist of UK AI Safety Institute; debate-protocol researcher

endorses

Advances formal scalable-oversight protocols (debate, prover-estimator debate) that aim to make honest behaviour the equilibrium strategy.

“We present a new scalable oversight protocol (prover-estimator debate) and a proof that honesty is incentivised at equilibrium.”
tweetProver-Estimator Debate tweet· X/Twitter· 2025-06· direct quote

Gus Docker

Future of Life Institute podcast host

endorses

Editorial stance gives extended platform to existential-risk-aligned voices; interviews push interviewees on specific cruxes rather than sound-bite framings.

What I try to do on the podcast is take the technical arguments seriously enough to ask the second-order questions about them, in public, on the record.
podcastFuture of Life Institute Podcast· Future of Life Institute· 2024· faithful paraphrase
Hilary Putnam

Hilary Putnam

Harvard philosopher (1926–2016); functionalism

mixed

Foundational reference for the philosophical position that AI minds are possible in principle. Putnam himself later moved away from strong functionalism.

Mental states could in principle be realized in any of an indefinite number of physical systems.

Context: Functionalism, the foundational philosophical assumption of strong AI.

§ paperThe Nature of Mental States· Capitan and Merrill, eds.· 1967· faithful paraphrase

Hung-yi Lee

National Taiwan University; speech and LLM researcher

mixed

Communicates technical alignment concepts to Mandarin-speaking audiences; engages seriously with safety arguments while maintaining a researcher's stance on what is and is not technically solved.

Understanding what large language models actually do is the precondition for any meaningful policy debate about them. Educational access to that understanding is itself a safety issue.
articleHung-yi Lee, NTU· NTU Speech Lab· 2024· faithful paraphrase

Iason Gabriel

DeepMind senior research scientist; AI ethics

endorses

Argues alignment requires not just instruction-following but value-pluralism: aligning to which values, when reasonable people disagree?

Alignment is not just a technical problem. It is a problem about whose values count.
§ paperArtificial Intelligence, Values and Alignment· Minds and Machines· 2020-10-31· faithful paraphrase
Isaac Asimov

Isaac Asimov

Science fiction author; Three Laws of Robotics author (1920–1992)

endorses

Early popular articulation of the alignment problem via the Three Laws of Robotics.

“A robot may not injure a human being or, through inaction, allow a human being to come to harm.”

Context: First Law of Robotics, 'Runaround', 1942.

§ paperRunaround· Astounding Science Fiction· 1942-03· direct quote
Iyad Rahwan

Iyad Rahwan

Max Planck Institute Berlin; Moral Machine experiment

endorses

Argues 'machine behaviour' is a distinct field of study, alongside human behaviour. Argues social-science methods should be used to study AI.

Machines now exhibit behaviours that need to be studied with the methods of behavioural science, not only with the methods of computer science.
§ paperMachine Behaviour· Nature· 2019-04-25· faithful paraphrase

Jacob Hilton

Alignment Research Center; Prover-Verifier Games

endorses

Technical alignment work on eliciting honest behaviour from more-capable models.

The alignment problem is about getting less-capable verifiers to reliably elicit truth from more-capable provers.
§ paperProver-Verifier Games Improve Legibility· arXiv· 2024-07· faithful paraphrase
Jan Leike

Jan Leike

Former head of OpenAI Superalignment; now at Anthropic

endorses

Argues alignment research must scale with capabilities; publicly resigned when he felt this ratio was violated.

“Over the past years, safety culture and processes have taken a backseat to shiny products.”

Context: Resignation thread on X/Twitter.

tweetResignation thread· X/Twitter· 2024-05-17· direct quote
Jared Kaplan

Jared Kaplan

Anthropic co-founder; scaling-laws co-author

endorses

Theorist of scaling laws who left academic physics for AI safety research via Anthropic.

If scaling laws continue, we will have models far beyond current capabilities well within this decade. That is a reason to take safety seriously, not to slow down research on safety.
articleJared Kaplan, Anthropic· Anthropic· 2023· loose paraphrase

Joar Skalse

Oxford researcher; reward-hacking formalism

endorses

Formalises reward-hacking failures in learned reward models; provides technical grounding for specification-gaming concerns.

We can formally characterise the conditions under which a learned reward model is hackable. The characterisation lets us design training regimes that reduce the attack surface.
§ paperDefining and Characterizing Reward Hacking· arXiv· 2022· loose paraphrase
John Jumper

John Jumper

DeepMind; AlphaFold lead; Nobel Prize 2024

mixed

Public on the application side: argues AI is most powerful when oriented at concrete scientific problems and that demonstrating value through real science is more credible than abstract claims.

“We have created the first computational tool that can routinely predict protein structures with accuracy competitive with experimental methods.”
§ paperHighly accurate protein structure prediction with AlphaFold· Nature· 2021-07-15· direct quote
John McCarthy

John McCarthy

Coined 'artificial intelligence' (1927–2011)

mixed

Founded the field. Cautious about over-claims; spent decades arguing for logic-based, common-sense reasoning approaches alongside the dominant statistical paradigms.

“Every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.”

Context: From the Dartmouth Workshop proposal, the founding statement of AI as a field.

§ paperA Proposal for the Dartmouth Summer Research Project on Artificial Intelligence· Dartmouth College· 1955· direct quote
John Schulman

John Schulman

Anthropic alignment researcher; OpenAI co-founder

endorses

Has publicly stated that deepening his focus on alignment was the reason for leaving OpenAI.

“I've made the difficult decision to leave OpenAI. This choice stems from my desire to deepen my focus on AI alignment, and to start a new chapter of my career where I can return to hands-on technical work.”
tweetJohn Schulman departure tweet· X/Twitter· 2024-08-05· direct quote

Jonah Brown-Cohen

DeepMind scalable oversight researcher

endorses

Works on formal debate protocols for scalable oversight.

Doubly-efficient debate lets an honest strategy verify correctness in polynomial time while the dishonest strategy has exponentially more steps to try to deceive.
§ paperScalable AI Safety via Doubly-Efficient Debate· arXiv· 2023-11-23· faithful paraphrase

Jonathan Frankle

Databricks Chief AI Scientist; Lottery Ticket Hypothesis

mixed

Argues understanding what makes models efficiently trainable is part of understanding what they are; safety conversations need this technical grounding.

“A randomly-initialized, dense neural network contains a subnetwork that is initialized such that, when trained in isolation, it can match the test accuracy of the original network.”
§ paperThe Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks· arXiv / ICLR· 2018-03· direct quote
Joseph Carlsmith

Joseph Carlsmith

Open Philanthropy researcher; 'Is Power-Seeking AI an Existential Risk?'

endorses

Argues misaligned power-seeking AI is a substantial existential risk this century and the case requires only weak premises about agentic capabilities.

My current estimate is that the probability of existential catastrophe from power-seeking, misaligned AI by 2070 is more than 10%.

Context: Updated estimate after the original report; revised upward.

§ paperIs Power-Seeking AI an Existential Risk?· arXiv / Open Philanthropy· 2022· faithful paraphrase

Karina Nguyen

Anthropic; ex-OpenAI; product research

mixed

Argues product-grounded research, how users actually use frontier AI in practice, surfaces alignment problems that pure benchmark studies miss.

When you watch real users interact with AI assistants, the alignment failures that show up are not the ones laboratory benchmarks are designed to catch. The friction is in the texture of conversation.
tweetKarina Nguyen on AI assistants· X· 2024· faithful paraphrase

Kate Saenko

Boston University CS professor; visual AI researcher

mixed

Engineer-grade alignment research: representation learning, adversarial robustness, transfer.

Robustness benchmarks are an under-developed scaffolding for AI safety practice.
articleKate Saenko, BU CS· Boston University· 2024· loose paraphrase

Lilian Weng

Thinking Machines; former OpenAI VP of Research

endorses

Public educator on alignment technical work. Ran OpenAI's Safety Systems team before leaving for Thinking Machines.

Reward modelling, RLHF, red-teaming, and eval frameworks are the concrete instruments of alignment today.
blogLil'Log· Lil'Log· 2023· loose paraphrase

Lukas Finnveden

Open Philanthropy; AI safety analyst

endorses

Argues alignment research must trace specific failure modes in concrete detail; favours quantitative scenario analysis over generic existential framings.

Plausible scenarios for AI takeoff include software-only feedback loops where AIs do AI research. Whether this leads to alignment failure depends on details that haven't been carefully argued.
blogLukas Finnveden, LessWrong· LessWrong· 2022· faithful paraphrase
Manuela Veloso

Manuela Veloso

CMU; head of AI research at JPMorgan Chase

mixed

Argues human-machine teaming, not autonomous agents alone, is the right model for high-stakes deployments; finance is a useful test domain because failures are visible and costly.

The right model for AI in high-stakes domains is not full autonomy. It is fluid teaming, where the human and the machine each pick up the parts of the task they are best suited to.
articleManuela Veloso at JPMorgan Chase AI Research· JPMorgan Chase· 2023· faithful paraphrase

Marc G. Bellemare

Mila / McGill; Atari Learning Environment

mixed

Argues distributional reinforcement learning, modelling the full distribution of returns rather than just the mean, is a richer foundation for safe deployment of RL systems.

Distributional RL learns the entire distribution over returns, not just its expectation. The richer signal turns out to matter for stability, exploration, and risk-sensitivity.
§ paperA Distributional Perspective on Reinforcement Learning· arXiv / ICML· 2017· faithful paraphrase
Marvin Minsky

Marvin Minsky

MIT AI lab co-founder (1927–2016); 'Society of Mind'

endorses

Foundational AI thinker who proposed mind-as-modular-society. Pre-dated modern alignment debates but anticipated the conceptual framework.

“What magical trick makes us intelligent? The trick is that there is no trick. The power of intelligence stems from our vast diversity, not from any single, perfect principle.”
bookThe Society of Mind· Simon & Schuster· 1986· direct quote

Nat McAleese

OpenAI researcher; ex-DeepMind reliability

endorses

Works on reward modelling and debate-style oversight; publicly engaged with alignment research.

Teaching language models to support answers with verified quotes is a concrete alignment sub-problem we can make progress on.
§ paperTeaching Language Models to Support Answers with Verified Quotes· arXiv· 2022· faithful paraphrase
Nick Bostrom

Nick Bostrom

Author of Superintelligence; founded Oxford's Future of Humanity Institute

endorses

Articulated the 'control problem': a superintelligence must have human-beneficial goals built in, because attempting to control a more capable agent after the fact is expected to fail.

“The fate of the gorillas now depends more on us humans than on the gorillas themselves; so too the fate of our species then would come to depend on the actions of the machine superintelligence.”
bookSuperintelligence· Oxford University Press· 2014-07-03· direct quote

Nora Belrose

EleutherAI alumni; optimistic alignment researcher

mixed

Argues practical alignment progress is real and that doom-scenario reasoning is often philosophically loaded.

Doom arguments tend to hinge on underdefined intuitions about 'optimization pressure' that I don't think survive engagement with real systems.
tweetNora Belrose on AI alignment· X/Twitter· 2024· faithful paraphrase
Norbert Wiener

Norbert Wiener

Founder of cybernetics (1894–1964)

endorses

Argued automated systems operating on imperfectly specified goals pose grave risks, the original articulation of the specification problem.

“If we use, to achieve our purposes, a mechanical agency with whose operation we cannot interfere effectively… we had better be quite sure that the purpose put into the machine is the purpose which we really desire.”
§ paperSome Moral and Technical Consequences of Automation· Science· 1960-05-06· direct quote

Oscar Moxon

AI safety researcher; independent

endorses

Independent contributor to the AI safety technical discussion; produces reproductions and critiques of frontier lab work.

Independent reproductions of frontier-lab claims are an undersupplied public good.
blogLessWrong· LessWrong· 2024· loose paraphrase

Owain Evans

Apollo Research co-founder; scheming behaviour researcher

endorses

Runs empirical research program demonstrating scheming-style behaviours in large models; argues governance frameworks need this evidence base.

Frontier models can scheme. That's now an empirical observation, not a theoretical concern.
articleApollo Research· Apollo Research· 2024· loose paraphrase

Paul Christiano

Founder of the US AI Safety Institute safety team; ex-OpenAI alignment lead

endorses

Canonical modern alignment researcher; works on debate, RLHF, and eliciting latent knowledge.

I'd guess something like a 20% chance of an AI takeover, with many of the humans dead, and a further 30% chance or so of serious irreversible problems short of takeover.

Context: From his LessWrong post 'My views on doom'.

blogMy views on doom· LessWrong· 2023-04-27· faithful paraphrase

Petar Veličković

DeepMind researcher; graph neural networks

mixed

Engaged technical researcher; publicly supportive of DeepMind's safety framework.

Structure-aware architectures can help us build AI systems that generalise more safely across domains.
blogPetar Veličković on geometric deep learning· petar-v.com· 2024· loose paraphrase

Peter Railton

Michigan ethicist; AI moral learning researcher

endorses

Argues that moral learning analogues in AI are a live research program for alignment.

Moral learning in humans draws on the same reinforcement-learning machinery we are now building into AI systems. That's not an accident; it is the alignment problem.
articlePeter Railton on moral learning and AI· University of Michigan· 2023· loose paraphrase

Quintin Pope

Researcher; shard theory co-originator

mixed

Shard theory reframes alignment as a training-dynamics problem rather than a utility-function specification problem.

Values in learned agents emerge as shards, context-activated circuits, not unified utility functions.
blogThe Shard Theory of Human Values· LessWrong· 2022-09· faithful paraphrase
Raia Hadsell

Raia Hadsell

Google DeepMind director of robotics & research

mixed

Argues continual learning and robotic embodiment are central to general AI; engages with safety as integral to research roadmaps.

Embodied learning forces you to confront questions that are easy to gloss over in language models: causality, partial observability, the cost of mistakes.
articleRaia Hadsell, DeepMind· Google DeepMind· 2023· faithful paraphrase

Ramana Kumar

Google DeepMind safety researcher; formal verification

endorses

Technical contributor to safety research, particularly around formal verification and agent tampering incentives.

Formal verification is under-used in AI safety. When you can prove a property rather than measure it, you should.
blogRamana Kumar, Alignment Forum· AI Alignment Forum· 2023· loose paraphrase
Richard Ngo

Richard Ngo

AI safety researcher; 'AGI safety from first principles'

endorses

Presents alignment as the most compelling lens for existential risk: by default, competent goal-directed systems pursue instrumental convergence away from human values.

The development of AGI may be one of the most consequential events in history, with the potential to either drastically increase or decrease the chances that humanity survives and flourishes.
§ paperAGI safety from first principles· AI Alignment Forum· 2020-09· faithful paraphrase
Rob Miles

Rob Miles

AI safety YouTuber

endorses

Public educator on alignment theory: instrumental convergence, specification gaming, inner alignment, mesa-optimization, deceptive alignment. Explains them in plain language for non-technical audiences.

You don't need the AI to hate you. Almost any objective, pursued hard enough, runs us over as a side effect.
videoRob Miles AI Safety· YouTube· 2017· faithful paraphrase
We need to know how to build AGIs we can trust before we build the AGIs. Right now we don't know how, and that is the whole problem.
videoIntro to AI Safety· YouTube· 2022· faithful paraphrase

Rob Wiblin

80,000 Hours podcast host

endorses

Argues longtermist AI safety prioritization is well-motivated; uses the 80k podcast to surface specific technical and policy paths in extended conversation.

If transformative AI arrives in our lifetimes, the decisions made in the next decade may shape the long-run future more than any others. That makes alignment work plausibly the highest-leverage problem.
podcastThe 80,000 Hours Podcast· 80,000 Hours· 2023· faithful paraphrase
Rohin Shah

Rohin Shah

Alignment researcher at Google DeepMind

endorses

Works on mechanistic interpretability, scalable oversight, and specification research within a frontier lab.

I think it's more likely than not that we can develop useful AI systems that are meaningfully aligned, but I'd place substantial probability on catastrophic outcomes.
blogAlignment Newsletter archive· rohinshah.com· 2023· loose paraphrase

Romeo Dean

AI 2027 co-author; AI Futures Project

endorses

Argues detailed scenario forecasting, rather than abstract probability estimates, is the more credible way to communicate AI risk and prepare for it.

The AI 2027 scenario is not a prediction. It is a credible mid-line forecast that reveals where current trajectories converge if no surprises hit.
articleAI 2027· AI Futures Project· 2025-04· faithful paraphrase

22 more on the record. The page renders the first 80 alphabetically; the rest live in the full directory, filterable by this tag.

tentative · 1

Below are entries flagged tentative: assignments inferred from a passing remark, hype quote, or paper abstract rather than a clear strategy statement. Shown in dashed cards so a stronger primary source can replace them later.

Hugo Larochelle

Hugo Larochelle

Google DeepMind; Mila

mixedtentative

Public on inclusivity and reproducibility in ML; less explicitly aligned-or-against on AI x-risk framings, but supportive of Bengio-style cautious framing in Canadian policy.

Reproducibility and accessibility in machine learning are not optional features; they are how we sustain the field's credibility.
articleHugo Larochelle, Mila profile· Mila· 2020· faithful paraphrase