AGI Strategies

person

Cynthia Rudin

Cynthia Rudin

Duke professor; interpretable ML pioneer

Computer scientist who has been the most consistent public voice against black-box ML in high-stakes domains. Argues interpretable models should always be preferred to post-hoc explanations of black boxes.

current Professor of Computer Science, Duke University

Profile

expertise

Deep technical

Sustained peer-reviewed contribution to ML, alignment, interpretability, or safety techniques. Could review a frontier paper.

Duke professor. Pioneered interpretable ML (as opposed to post-hoc explanation). Won the AAAI Squirrel AI Award 2022 for AI for the Benefit of Humanity.

recognition

Field-leading

Widely known inside the AI and AI-safety community. Appears repeatedly in top venues, podcasts, or governance forums. Not a household name to outsiders.

Recognised in interpretable-ML community; less mainstream press than DeepMind/Anthropic interpretability leads.

vintage

Deep-learning rise

Came up post-AlexNet. ImageNet, AlphaGo, transformer paper. DeepMind, Google Brain, FAIR establish the modern lab template.

PhD 2004 (Princeton). Interpretable-ML programme matures in 2010s as a counter to deep-learning opacity.

Hand-classified. See the board for the criteria and the full grid.

Strategy positions

Interpretability betmixed

Mechanistic interpretability is necessary and sufficient to know models are safe

Argues for inherently interpretable models over post-hoc explanations, a different flavour of interpretability than the mechanistic-interpretability school.

“Stop explaining black box machine learning models for high-stakes decisions and use interpretable models instead.”
§ paperStop Explaining Black Box Machine Learning Models for High Stakes Decisions· Nature Machine Intelligence· 2019-05· direct quote

Closest strategy neighbours

by jaccard overlap

Other people whose strategy tags overlap with Cynthia Rudin's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.

  • Asma Ghandeharioun

    shared 1 · J=1.00

    Google DeepMind; 'Patchscopes' for LLM interpretability

  • Chris Olah

    Chris Olah

    shared 1 · J=1.00

    Anthropic interpretability co-founder; inventor of modern mech interp

  • David Bau

    shared 1 · J=1.00

    Northeastern; mechanistic interpretability of LLMs

  • Fernanda Viégas

    shared 1 · J=1.00

    Harvard; ex-Google PAIR; data visualization

  • Jacob Andreas

    Jacob Andreas

    shared 1 · J=1.00

    MIT NLP; language models as belief reports

  • John Wentworth

    John Wentworth

    shared 1 · J=1.00

    Independent alignment researcher; natural abstractions

Record last updated 2026-04-24.