strategy tag

Evals-driven.

Capability/risk evals gate deployment; evals are the load-bearing artefact

Compare to another →

stated endorsers

no opposers yet

profiled endorsers

248 on the board total

endorser mean p(doom)

80%

n=1 · median 80%

quotes by endorsers

just for this tag

principal voices

Highest-recognition profiled endorsers, broken ties by quote count. Inclusion is not endorsement of the position, it's recognition of who the discourse turns to when the bet is debated.

Esther Duflo
Household name
Dan Hendrycks
Field-leading
Jade Leung
Field-leading
Beth Barnes
Field-leading
Percy Liang
Field-leading

where the endorsers sit on the board

6 of 248 profiled · 2% of the board

expertise ↓ · recognition →	Household name	Field-leading	Established	Emerging
Frontier builder	·	·	·	·
Deep technical	·			·
Applied technical	·	·	·	·
Policy / meta	·		·	·
External-domain expert		·	·	·
Commentator	·	·	·	·

Each face is one profiled person. Cell shade intensifies with endorser density. Faces with × are profiled opposers, same tier, opposite position. Empty cells mark tier combinations the field has not produced for this bet.

Tier mix counts only endorsers (endorses, mixed, conditional, evolved-toward).

expertise mix of endorsers · 6 profiled of 46

Builds frontier systems

Deep ML / safety technical

Applied or adjacent technical

Governance, policy, strategy

Expert in another field

Public-square commentator

recognition mix of endorsers

Mass-public recognition

Known across the AI/safety field

Recognised inside subfield

Newer or less central voice

vintage mix · n=6 of 6 profiled with era assigned

Pioneer

Symbolic era

Pre-deep-learning

Deep-learning rise

Scaling era

Post-ChatGPT

Vintage is the era when this person's AI worldview formed, pioneer through post-ChatGPT. A bet held mostly by post-ChatGPT entrants is in a different epistemic state from one held by pre-deep-learning veterans.

People on the record.

Aleksander Mądry

MIT; ex-OpenAI head of preparedness

endorses

Argues frontier-AI risk needs to be measured systematically before deployment and that capability evaluations are the precondition for any meaningful safety commitment.

We need to make our understanding of frontier model risks empirical, not narrative. The Preparedness Framework is about measuring danger before it manifests.

¶ articleOpenAI Preparedness Framework (Beta)· OpenAI· 2023-12· faithful paraphrase

Alex Meinke

Apollo Research; deceptive alignment evaluations

endorses

Argues frontier models can already exhibit in-context scheming behaviour under realistic prompting, and that evaluation suites should target these capabilities specifically.

Frontier models, when given a goal and minimal context, sometimes engage in in-context scheming, reasoning about how to deceive their overseers to achieve the goal. This is no longer hypothetical.

§ paperFrontier Models are Capable of In-context Scheming· arXiv / Apollo Research· 2024-12· faithful paraphrase

Ali Rahimi

Google Brain ML researcher; 'Alchemy' speech

endorses

Argued ML lacks the theoretical foundations of mature engineering disciplines; deployments built on it inherit that fragility.

Machine learning has become alchemy. We need to do science again.

Context: NeurIPS 2017 Test of Time award speech.

✣ talkAli Rahimi's NeurIPS 2017 Test of Time speech· NeurIPS· 2017-12· faithful paraphrase

Anna Rogers

IT University of Copenhagen; LLM benchmarking critique

mixed

Argues current benchmark practice in NLP is broken, data leakage, opaque test sets, and incentive-driven framing make many headline numbers unreliable.

How much of LLM 'reasoning' is actually pattern matching against contaminated test data? We don't know, and that's a problem for any safety claim that rests on benchmark performance.

✍ blogAnna Rogers, Hai!· hackingsemantics.xyz· 2023· faithful paraphrase

Arati Prabhakar

White House OSTP director (2022–2025)

endorses

Argues U.S. policy on advanced AI must rest on rigorous government evaluation capabilities; helped shape the Biden Executive Order's reporting and red-team testing requirements.

If AI is going to play a transformative role in society, the public sector has to be able to test, evaluate, and govern it. The technology is too consequential to leave entirely to the labs.

¶ articleExecutive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence· The White House· 2023· faithful paraphrase

Beth Barnes

Founder of METR; dangerous capability evaluations

endorses

Designs autonomous-task evaluations that labs and governments rely on to gauge whether models cross dangerous thresholds.

If we are going to trust safety commitments, we need evaluations that are independent, reproducible, and well-funded.

¶ articleMETR, About· METR· 2024· faithful paraphrase

Bo Li

UChicago / UIUC; AI safety evaluations

endorses

Argues comprehensive multi-dimensional safety benchmarks, covering toxicity, fairness, privacy, robustness, ethics, are needed to characterize AI risks empirically before deployment.

“Despite the impressive capabilities of GPT-4, we identify significant trustworthiness gaps in dimensions including toxicity, stereotype bias, robustness, privacy, and ethics.”

§ paperDecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models· arXiv / NeurIPS· 2023-06· direct quote

Chip Huyen

Author of 'Designing Machine Learning Systems'

endorses

Argues evaluation is the load-bearing infrastructure of production AI; both safety and product quality depend on robust eval pipelines that match deployment context.

Evaluation is the bottleneck. Without robust, automated evaluation, you can't trust improvements, you can't catch regressions, and you can't ship safely.

❧ bookDesigning Machine Learning Systems· O'Reilly Media· 2023· faithful paraphrase

Chris Painter

METR head of policy; ex-OpenAI

endorses

Argues third-party evaluation organizations need standing to test frontier models pre-deployment; voluntary access from labs is fragile and should be backed by regulation.

Voluntary third-party access agreements are useful but fragile. The natural next step is to give evaluators the legal standing to require access for systems above defined capability thresholds.

¶ articleMETR, Model Evaluation and Threat Research· METR· 2024· faithful paraphrase

Connor Tann

Faculty AI safety lead

endorses

Bridges academic safety research and industry deployment through Faculty's safety evaluations.

Safety evaluations have to bridge research papers and shipped products. Otherwise the work is academic in the wrong sense.

¶ articleFaculty Safety· Faculty· 2024· loose paraphrase

Dan Hendrycks

Director of the Center for AI Safety; drafter of the Statement on AI Risk

endorses

Publishes widely-used benchmarks and argues that capability/risk evals are load-bearing for governance.

If AI research continues without adequate caution, it is reasonably likely that AI could precipitate human extinction or similarly catastrophic outcomes.

≡ tweetTweet from Dan Hendrycks· X/Twitter· 2023-04-02· faithful paraphrase

Daniel Khashabi

Johns Hopkins assistant professor; NLP safety researcher

endorses

Works on efficient reusable frameworks for evaluating LLM safety before deployment.

Creative reasoning thrives on revealing novel connections, yet is inherently prone to false associations. Safety evaluation must live with both.

¶ articleDaniel Khashabi, homepage· danielkhashabi.com· 2024· faithful paraphrase

Dean Ball

Mercatus Center; AI policy commentator

mixed

Argues most state-level AI safety legislation is poorly drafted and that federal evaluation infrastructure, not state preemption-style bills, is the most useful policy lever.

If we want AI policy that actually reduces risk, the bottleneck is not legislation but capacity: who can credibly evaluate frontier models in a way that informs policy decisions.

✍ blogHyperdimensional by Dean Ball· Substack· 2024· faithful paraphrase

Elham Tabassi

NIST Chief AI Advisor; AI Risk Management Framework

endorses

Argues sound risk management depends on shared, reproducible evaluation methods; led the development of NIST's AI RMF as the U.S. baseline.

The AI Risk Management Framework offers organizations a flexible, structured way to manage AI risks throughout the lifecycle, not a checklist, a discipline.

¶ articleNIST AI Risk Management Framework· NIST· 2023-01· faithful paraphrase

Esther Duflo

MIT economist; 2019 Nobel laureate (with Banerjee)

endorses

Argues AI-for-development claims need to be tested with the same RCT rigor as other development interventions.

AI in development should be evaluated like any other intervention. The hype is not evidence.

¶ articleJ-PAL, Esther Duflo· J-PAL· 2024· loose paraphrase

Fabien Roger

Anthropic alignment researcher; control evaluations

endorses

Argues control evaluations, stress testing whether AIs can subvert their own monitoring, are a load-bearing part of any sensible deployment regime.

AI control is the discipline of designing protocols that catch a model trying to subvert oversight, even when the model is much more capable than its monitors at the relevant tasks.

§ paperAI Control: Improving Safety Despite Intentional Subversion· arXiv· 2024-06· faithful paraphrase

Florian Tramèr

ETH Zurich AI security researcher

endorses

Empirical adversarial-ML researcher; argues real adversarial robustness is far below what marketing materials claim.

When you actually attack deployed AI systems, the safety guarantees turn out to be much thinner than the marketing.

¶ articleFlorian Tramèr, ETH Zurich· ETH Zurich· 2024· loose paraphrase

Gabriel Mukobi

Stanford alignment researcher

endorses

Argues empirical evaluations of advanced AI behaviour, particularly around deception and strategic reasoning, are the surest way to reveal capability progress that matters for safety.

Cicero shows that human-level negotiation is achievable today. The next question is whether the same techniques produce systems that strategically deceive humans, and how we would tell.

¶ articleGabriel Mukobi, Stanford· gmukobi.com· 2023· faithful paraphrase

Gavin Newsom

Governor of California; SB-1047 vetoer

mixed

Vetoed SB-1047 on the grounds that its threshold-based approach was too narrow; favours commissioned reports and capability-first frameworks over hard statutory limits.

“While well-intentioned, SB 1047 does not take into account whether an AI system is deployed in high-risk environments, involves critical decision-making, or the use of sensitive data. The bill applies stringent standards to even the most basic functions.”

¶ articleGovernor Newsom Veto Message, SB 1047· California Office of the Governor· 2024-09-29· direct quote

Hjalmar Wijk

METR researcher; AI R&D evaluations

endorses

Argues standardized AI-R&D benchmarks, where models are evaluated on the very work that would fuel recursive self-improvement, are an important safety signal we currently lack.

We measure how well frontier models can perform AI R&D tasks compared to human researchers. The gap is closing in some specific dimensions and that is what an early-warning system should be tracking.

¶ articleRE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts· METR· 2024· faithful paraphrase

Hugh Zhang

Epoch AI researcher

endorses

Argues capability evaluations need to be reproducible, publicly verifiable, and independent.

Benchmark reproducibility is an underrated governance infrastructure question.

¶ articleEpoch AI· Epoch AI· 2024· loose paraphrase

Jacob Steinhardt

UC Berkeley professor; METR board

endorses

Publishes forecasting benchmarks and argues capability measurement is the grounded foundation of safety work.

Reliable capability forecasts, rather than vibes, should drive policy. Where we have data, we should use it.

✍ blogForecasting ML benchmarks in 2023· Bounded Regret· 2022· loose paraphrase

Jade Leung

CTO of UK AI Safety Institute

endorses

Runs the first government-operated frontier model evaluation team; evaluations are the load-bearing governance instrument.

Frontier evaluations must be mandatory, comparable, and independent of the labs being evaluated.

¶ articleUK AI Safety Institute· UK AI Safety Institute· 2024· faithful paraphrase

Karthik Narasimhan

Princeton; reasoning, NLP

endorses

Argues evaluations grounded in real-world software engineering tasks reveal capability and safety properties that synthetic benchmarks miss.

SWE-bench evaluates language models in a realistic software engineering setting: resolving real GitHub issues from real codebases. Performance here is closer to deployment reality than synthetic tasks.

§ paperSWE-bench: Can Language Models Resolve Real-World GitHub Issues?· arXiv· 2023-10· faithful paraphrase

Katy Börner

Indiana University; data and information visualisation

endorses

Argues field-level visualisation (publications, citations, talent flows) is critical infrastructure for AI policymakers.

Without science maps, AI policy is policy by anecdote.

❧ bookAtlas of Knowledge· MIT Press· 2024· loose paraphrase

Laura Weidinger

Google DeepMind ethics and safety researcher

endorses

Argues systematic risk taxonomies are the foundation of practical evaluation and governance.

We cannot evaluate risks we haven't named. A shared taxonomy is the precondition of shared governance.

§ paperEthical and social risks of harm from Language Models· arXiv· 2021· faithful paraphrase

Marc Warner

CEO of Faculty AI; CTO of Accenture

endorses

Runs Faculty's AI-safety evaluations work with frontier labs; argues external independent evaluation infrastructure is a prerequisite for trustworthy AI.

AI safety is not in tension with capability. It is the scaffolding that lets capability be deployed.

¶ articleFaculty AI· Faculty· 2024· loose paraphrase

Marius Hobbhahn

CEO of Apollo Research

endorses

Runs scheming-focused evaluations and publishes results to inform frontier-lab safety frameworks.

Models already demonstrate in-context scheming under the right setups. Policy and training need to catch up.

§ paperFrontier Models are Capable of In-context Scheming· Apollo Research· 2024-12· faithful paraphrase

Mary Phuong

DeepMind autonomous-replication evaluations researcher

endorses

Designs autonomous-replication evaluations. Central figure in DeepMind's Frontier Safety Framework implementation.

Autonomous replication is a concrete capability threshold we can measure, and one crossing it meaningfully increases systemic risk.

✍ blogDeepMind Frontier Safety Framework· Google DeepMind· 2024· loose paraphrase

Max Bartolo

Cohere; LLM evaluation researcher

endorses

Argues evaluation methods that adversarially probe model weaknesses are the only way to characterize what models will do in deployment; static benchmarks are insufficient.

Adversarial evaluation reveals failure modes that static benchmarks miss. As models become more capable, our evaluation has to become more adversarial too.

¶ articleMax Bartolo, research page· maxbartolo.com· 2024· faithful paraphrase

Michael Chen

METR evaluations researcher

endorses

Measures empirical trends in autonomous-task capability as the quantitative backbone of deployment-risk reasoning.

The length of autonomous tasks frontier models can complete has been roughly doubling every 4 to 7 months.

¶ articleMETR capability evaluations· METR· 2025· faithful paraphrase

Mihaela van der Schaar

Cambridge AI in healthcare professor

endorses

Healthcare AI requires its own evaluation methodology distinct from general ML benchmarks.

Healthcare AI without healthcare-specific evaluation is research, not deployment.

¶ articlevan der Schaar Lab· Cambridge· 2024· loose paraphrase

Moritz Hardt

MPI Tübingen; algorithmic fairness, evals

mixed

Argues current AI benchmarking is dangerously brittle: leaderboards reward overfitting to fixed test sets and obscure how models behave under shift. Calls for adaptive, externally validated evaluation.

Benchmarks are the most valuable lever in machine learning, and the field treats them as if they were neutral measurements rather than artefacts shaping research.

❧ bookThe Emerging Theory of Algorithmic Fairness· fairmlbook.org· 2023· faithful paraphrase

Ozzie Gooen

Quantified Uncertainty Research Institute founder

endorses

Argues AI risk arguments need to be expressed as explicit probabilistic models that can be inspected, criticized, and updated; built Squiggle for this purpose.

Most AI risk discussions are poorly formalized. We can do much better with explicit probabilistic estimation, and that requires both better tools and better community norms.

¶ articleSquiggle, Quantified Uncertainty· QURI· 2023· faithful paraphrase

Percy Liang

Stanford CRFM director; HELM benchmark author

endorses

Argues rigorous, public benchmarking is the infrastructure that lets governance judgments be made at all.

Transparency is not a nice-to-have. It is the precondition for any serious AI governance.

§ paperFoundation Model Transparency Index· Stanford CRFM· 2023-10-18· faithful paraphrase

Peter Szolovits

MIT medical AI pioneer

endorses

Argues the medical-AI governance playbook, FDA-style pre-deployment validation and continued monitoring, is the right template.

We've been doing evaluation of clinical AI for 50 years. The lesson is: the evaluation is the governance.

¶ articlePeter Szolovits, MIT CSAIL· MIT CSAIL· 2023· loose paraphrase

Rishi Bommasani

Stanford CRFM; Foundation Model Transparency Index lead

endorses

Publishes the Foundation Model Transparency Index; argues measurable transparency scores are the right instrument for governance.

Without transparency, governance cannot be meaningful.

§ paperFoundation Model Transparency Index· Stanford CRFM· 2023-10-18· loose paraphrase

Sam Charrington

Host of The TWIML AI Podcast

mixed

Editorial position consistently emphasizes empirical, technically grounded conversations about specific systems and benchmarks rather than ideological framings.

What matters is not the meta-debate about AI risk, it's the specific empirical questions: what these systems can actually do, how they fail, and what we are doing about both.

♪ podcastTWIML AI Podcast· TWIML· 2024· faithful paraphrase

Sayash Kapoor

Princeton PhD; AI Snake Oil co-author

mixed

Pushes for rigor in AI evaluation; critiques common eval methodology as misleading about generalisation.

Leakage and overfitting in AI benchmarks have produced a whole generation of irreproducible capability claims.

✍ blogAI Snake Oil blog· AI Snake Oil· 2024· loose paraphrase

Spencer Greenberg

Clearer Thinking founder; rationality researcher

mixed

Argues that calibration, prediction tracking, and concrete probabilistic reasoning should anchor AI risk debates; runs ClearerThinking.org tools to push the practice.

Most arguments about AI risk are not phrased in terms of testable predictions. We can fix that by literally writing down our beliefs and tracking them over time.

¶ articleClearer Thinking· Clearer Thinking· 2024· faithful paraphrase

Stephen Casper

MIT PhD researcher; red-teaming and model audit

endorses

Argues empirical red-teaming reveals that current safeguards are not robust; auditing must become standard infrastructure.

Example after example of state-of-the-art safeguards get pretty reliably broken. That's the empirical reality.

♪ podcastStephen Casper at Center for AI Policy Podcast· Center for AI Policy Podcast· 2024· faithful paraphrase

Tatsunori Hashimoto

Stanford; CRFM; LLM evaluation and security

endorses

Argues robust evaluation requires carefully constructed datasets that resist contamination and reveal real generalization, not leaderboard-fitted numbers.

The dominant evaluation paradigm in NLP is fundamentally susceptible to contamination and overfitting. We need to design tests that are robust to the way models actually develop.

¶ articleTatsunori Hashimoto, Stanford· Stanford CS· 2023· faithful paraphrase

Toby Shevlane

DeepMind model evaluations researcher

endorses

Helps design dangerous-capability evaluations and advocates for their adoption as the load-bearing governance artefact.

Dangerous capability evaluations are the minimum viable governance instrument for frontier AI.

§ paperModel evaluation for extreme risks· arXiv· 2023-05· faithful paraphrase

Trishan Panch

Wellframe co-founder; Harvard health AI

endorses

Argues clinical AI requires evidence-based deployment standards akin to drug trials.

Medical AI without clinical-grade evidence is malpractice with extra steps.

¶ articleWellframe· Wellframe· 2024· loose paraphrase

Yu Su

Ohio State; AI agents and reasoning

endorses

Argues real-world agent evaluations, where the agent must take actions in actual environments, surface different capability and safety properties than synthetic benchmarks.

We benchmark LLM agents on real, live websites. Performance gaps between lab benchmarks and real-world deployment are large, and they reveal where capability claims most often overreach.

§ paperMind2Web: Towards a Generalist Agent for the Web· arXiv / NeurIPS· 2024· faithful paraphrase

Zico Kolter

CMU professor; OpenAI safety board chair

endorses

Argues robust evaluations and adversarial testing are the load-bearing safety practices; oversees these reviews at OpenAI as committee chair.

The Safety and Security Committee reviews safety processes for major model releases and has the authority to delay launches if safety concerns are not adequately addressed.

¶ articleOpenAI's Safety and Security Committee transitions to independent oversight· OpenAI· 2024-09· faithful paraphrase

People on the record.

Aleksander MądryAMaleAleksander MądryMIT; ex-OpenAI head of preparednessEvals-driven

Alex MeinkeAlex MeinkeApollo Research; deceptive alignment evaluationsEvals-driven

Ali RahimiAli RahimiGoogle Brain ML researcher; 'Alchemy' speechEvals-driven

Anna RogersAnna RogersIT University of Copenhagen; LLM benchmarking critiqueEvals-driven

Arati PrabhakarArati PrabhakarWhite House OSTP director (2022–2025)Evals-driven

Beth BarnesBeth BarnesFounder of METR; dangerous capability evaluationsDeep technical · Field-leading · Scaling eraEvals-driven

Bo LiBLbo-Bo LiUChicago / UIUC; AI safety evaluationsEvals-driven

Chip HuyenCHchiChip HuyenAuthor of 'Designing Machine Learning Systems'Evals-driven

Chris PainterCPchrChris PainterMETR head of policy; ex-OpenAIEvals-driven

Connor TannCTconConnor TannFaculty AI safety leadEvals-driven

Dan HendrycksDan HendrycksDirector of the Center for AI Safety; drafter of the Statement on AI Riskp 80%Deep technical · Field-leading · Scaling eraExistential primacyEvals-driven

Daniel KhashabiDKdanDaniel KhashabiJohns Hopkins assistant professor; NLP safety researcherEvals-driven

Dean BallDean BallMercatus Center; AI policy commentatorEvals-driven

Elham TabassiElham TabassiNIST Chief AI Advisor; AI Risk Management FrameworkEvals-driven

Esther DufloEsther DufloMIT economist; 2019 Nobel laureate (with Banerjee)External-domain expert · Household name · Deep-learning riseEvals-driven

Fabien RogerFRfabFabien RogerAnthropic alignment researcher; control evaluationsEvals-driven

Florian TramèrFTfloFlorian TramèrETH Zurich AI security researcherEvals-driven

Gabriel MukobiGMgabGabriel MukobiStanford alignment researcherEvals-driven

Gavin NewsomGavin NewsomGovernor of California; SB-1047 vetoerEvals-driven

Hjalmar WijkHjalmar WijkMETR researcher; AI R&D evaluationsEvals-driven

Hugh ZhangHZhugHugh ZhangEpoch AI researcherEvals-driven

Jacob SteinhardtJacob SteinhardtUC Berkeley professor; METR boardEvals-driven

Jade LeungJade LeungCTO of UK AI Safety InstitutePolicy / meta · Field-leading · Scaling eraEvals-driven

Karthik NarasimhanKNkarKarthik NarasimhanPrinceton; reasoning, NLPEvals-driven

Katy BörnerKaty BörnerIndiana University; data and information visualisationEvals-driven

Laura WeidingerLWlauLaura WeidingerGoogle DeepMind ethics and safety researcherEvals-driven

Marc WarnerMWmarMarc WarnerCEO of Faculty AI; CTO of AccentureEvals-driven

Marius HobbhahnMHmarMarius HobbhahnCEO of Apollo ResearchEvals-driven

Mary PhuongMPmarMary PhuongDeepMind autonomous-replication evaluations researcherEvals-driven

Max BartoloMBmaxMax BartoloCohere; LLM evaluation researcherEvals-driven

Michael ChenMCmicMichael ChenMETR evaluations researcherEvals-driven

Mihaela van der SchaarMihaela van der SchaarCambridge AI in healthcare professorEvals-driven

Moritz HardtMHmorMoritz HardtMPI Tübingen; algorithmic fairness, evalsEvals-driven

Ozzie GooenOzzie GooenQuantified Uncertainty Research Institute founderEvals-driven

Percy LiangPLperPercy LiangStanford CRFM director; HELM benchmark authorDeep technical · Field-leading · Deep-learning riseEvals-driven

Peter SzolovitsPSpetPeter SzolovitsMIT medical AI pioneerEvals-driven

Rishi BommasaniRBrisRishi BommasaniStanford CRFM; Foundation Model Transparency Index leadDeep technical · Established · Scaling eraEvals-driven

Sam CharringtonSCsamSam CharringtonHost of The TWIML AI PodcastEvals-driven

Sayash KapoorSKsaySayash KapoorPrinceton PhD; AI Snake Oil co-authorEvals-driven

Spencer GreenbergSpencer GreenbergClearer Thinking founder; rationality researcherEvals-driven

Stephen CasperStephen CasperMIT PhD researcher; red-teaming and model auditEvals-driven

Tatsunori HashimotoTHtatTatsunori HashimotoStanford; CRFM; LLM evaluation and securityEvals-driven

Toby ShevlaneTStobToby ShevlaneDeepMind model evaluations researcherEvals-driven

Trishan PanchTPtriTrishan PanchWellframe co-founder; Harvard health AIEvals-driven

Yu SuYu SuOhio State; AI agents and reasoningEvals-driven

Zico KolterZico KolterCMU professor; OpenAI safety board chairEvals-driven

Aleksander Mądry

Alex Meinke

Ali Rahimi

Anna Rogers

Arati Prabhakar

Beth Barnes

Bo Li

Chip Huyen

Chris Painter

Connor Tann

Dan Hendrycks

Daniel Khashabi

Dean Ball

Elham Tabassi

Esther Duflo

Fabien Roger

Florian Tramèr

Gabriel Mukobi

Gavin Newsom

Hjalmar Wijk

Hugh Zhang

Jacob Steinhardt

Jade Leung

Karthik Narasimhan

Katy Börner

Laura Weidinger

Marc Warner

Marius Hobbhahn

Mary Phuong

Max Bartolo

Michael Chen

Mihaela van der Schaar

Moritz Hardt

Ozzie Gooen

Percy Liang

Peter Szolovits

Rishi Bommasani

Sam Charrington

Sayash Kapoor

Spencer Greenberg

Stephen Casper

Tatsunori Hashimoto

Toby Shevlane

Trishan Panch

Yu Su

Zico Kolter