AGI Strategies

person

Tri Dao

Princeton; Together AI; FlashAttention and Mamba

Princeton assistant professor and chief scientist at Together AI. Lead author of FlashAttention (2022) and co-author of Mamba (2023). Foundational contributor to efficient transformer training.

current Assistant Professor of Computer Science, Princeton University; Chief Scientist, Together AI

Strategy positions

Accelerationendorsestentative

Build faster; delay costs more than capability

Argues throughput and efficiency improvements, not new architectures alone, are doing most of the heavy lifting in capability progress; positions Together AI's open infrastructure on this thesis.

FlashAttention computes attention with no approximation in linear memory, by being aware of GPU memory hierarchy. The same engineering carefulness can keep delivering capability for years.
§ paperFlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness· arXiv / NeurIPS· 2022-05· faithful paraphrase

Closest strategy neighbours

by jaccard overlap

Other people whose strategy tags overlap with Tri Dao's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.

  • Aditya Ramesh

    shared 1 · J=1.00

    OpenAI DALL·E creator

  • Albert Gu

    shared 1 · J=1.00

    CMU; Mamba and structured state-space models

  • Alec Radford

    shared 1 · J=1.00

    OpenAI; lead author of GPT, Whisper, CLIP

  • Ashish Vaswani

    shared 1 · J=1.00

    Co-founder Essential AI; lead author of 'Attention Is All You Need'

  • Brian Chau

    shared 1 · J=1.00

    Executive Director of Alliance for the Future

  • Charlie Snell

    shared 1 · J=1.00

    UC Berkeley; LLM efficiency and inference compute

Record last updated 2026-04-25.