Grant Sanderson (@3blue1brown) – AI and the future of math
Grant Sanderson discusses AI's rapid progress in mathematics, exploring why benchmarks like IMO gold medals don't signal AGI, the importance of grindability and verifiability in AI training, and how mathematical progress will likely shift from theorem-proving toward conjecture generation, definition-making, and knowledge distillation. He argues that mathematics offers unique advantages for AI development because it's containerizable and verifiable, making it fundamentally different from other domains.
Summary
In this extended conversation about AI and mathematics, Grant Sanderson addresses the evolution of AI capabilities in mathematical problem-solving. He begins by explaining why achieving gold in the International Math Olympiad—once thought to be a harbinger of AGI—turned out to be merely another benchmark, with AI systems showing uneven performance across problem categories (excelling at geometry through brute force, struggling with combinatorics). The discussion then pivots to what matters most for AI progress in mathematics beyond solving existing problems: the ability to generate interesting conjectures, propose new definitions, and make conceptual connections across fields—skills Sanderson characterizes as distinctly different from and more valuable than pure theorem-proving.
A major theme involves the technical conditions enabling AI progress in math. Sanderson argues that grindability—the ability to run parallel simulations in containerized environments with deterministic outcomes—matters more than verifiability alone. Mathematics benefits from this property because code is containerizable and reproducible in ways that real-world domains aren't. This is why math and coding have seen faster AI progress than domains like web browsing or business strategy, which involve interacting with changing real-world systems. He notes that formal systems like Lean, while potentially useful for long-term unfettered exploration, may be overrated as a training signal compared to natural language verification with process-based supervision, as demonstrated by recent models using LLM judges to evaluate mathematical reasoning.
The conversation explores historical examples of mathematical breakthroughs that resisted immediate verification, using Galois theory as the primary case study. Sanderson details how Lagrange identified symmetry as the right lens for studying polynomials, Abel proved the quintic unsolvable, but Galois developed the abstract framework that revealed the deeper structure—yet this framework wasn't recognized as valuable for roughly 100 years until applications emerged in physics and cryptography. He uses this to illustrate that verification loops for mathematical breakthroughs can be extremely long, making it difficult to train AI systems with immediate reward signals.
On the question of whether AI solving the Riemann Hypothesis would advance human understanding, Sanderson identifies three distinct scenarios: (1) connecting existing ideas from different fields ("lightning bolts"), which would be readily interpretable to humans; (2) building new mathematical frameworks ("mountains"), which would require humans to learn the new abstractions; (3) raw computational brute force without new insights, which would leave understanding untouched. He argues that if the solution involves mountain-building, it becomes crucial whether those mountains are built in human-intelligible or "alien" ways.
The transcript includes significant discussion of how mathematicians actually work and what constitutes valuable mathematical contribution. Rather than most mathematicians targeting specific open problems, many are engaged in broader pattern-seeking aligned with research programs like the Langlands Program—trying to understand the landscape of mathematical connections. Sanderson suggests AI could become exceptional at finding these connections when trained in environments that incentivize multi-field expertise, though he acknowledges this is harder to formalize as a training objective than discrete problem-solving.
On learning with AI, Sanderson emphasizes that LLMs serve better as pointers to human-written resources than as primary instructors. He describes productive learning approaches combining human-curated sources (textbooks, lectures, videos) with LLM assistance for clarification on branches of understanding, noting that LLMs often mimic Wikipedia-like correctness while lacking the deliberate pedagogical misdirection and motivation-building that characterize excellent exposition. He also identifies a gap in LLM capabilities: they struggle to recognize when a student's mental model differs fundamentally from the expert's, and thus cannot productively redirect that thinking.
Final sections address career implications for mathematicians in an AI-accelerated world. Sanderson argues that prospective mathematicians should understand where funding flows and what value they're actually providing—whether that's institutional prestige, grant-funded basic research, or teaching. He suggests that in a world where AI advances mathematics dramatically, the most stable and valuable roles would involve curating which mathematical advances matter, explaining complex AI-discovered mathematics to broader audiences, and directing mathematical progress toward practically useful applications. The conversation acknowledges deep uncertainty about whether accelerated pure mathematics will unlock corresponding real-world progress, with some specialized domains (like PDE-based engineering simulation) more likely to benefit than others.
Key Insights
- IMO gold and other mathematical benchmarks are merely another checkpoint in AI progress, not a threshold to AGI, because AI capabilities remain spiky across different mathematical domains (geometry is solved via brute force, combinatorics remains challenging)
- Grindability—the ability to run parallel, deterministic, containerized simulations—matters more than pure verifiability for AI progress, which is why math and coding advance faster than domains requiring real-world interaction
- Galois theory's century-long journey from proposal to recognition as valuable demonstrates that verification loops for mathematical breakthroughs can be extremely long, making immediate reward-based training difficult
- Most mathematicians don't primarily target specific open problems but rather engage in broader landscape exploration aligned with research programs like the Langlands Program, seeking connections between different mathematical domains
- LLMs struggle at mentalizing and theory of mind in ways fundamental to good pedagogy—recognizing when a student's mental model differs from the expert's and redirecting accordingly—because they lack embodied experience and face muscles to mirror human understanding
Topics
Transcript
[0:00] Today I'm chatting with Grant Sanderson who runs through Blue and Brown and is now working on a new project documenting the progress AI is making in math and I wanted to talk to you about this because AI has been making the fastest progress in mathematics as as of any other field. So whatever is happening here and whatever way we're seeing AI progress happen or not happen would tell us about what will happen to the rest of the world as AI gets better and better. So, I wanted to start with this question I asked you when I first interviewed you three years ago. And I asked you once we have AIS that can get gold…
Full transcript available for MurmurCast members
Sign Up to AccessMore from Dwarkesh Patel
The reason Russia and China can't win at sea - Sarah Paine
Sarah Paine argues that Russia and China lack the necessary prerequisites for maritime dominance, including protection from invasion, dense internal transportation networks, reliable sea access, dense coastal populations, commerce-driven economies, and stable democratic institutions. Despite their maritime ambitions, neither country possesses the full set of conditions required for a successful maritime paradigm.
The One Job AI Can't Replace, According to @3blue1brown
3Blue1Brown argues that teaching is one of the most stable careers in a post-AGI world because it is fundamentally relational and social rather than purely explanatory. Even if AI becomes proficient at explaining concepts, the coaching and mentoring aspects of teaching—which go far beyond information delivery—will remain valuable and irreplaceable.
Renaissance art was a weapon - Ada Palmer
Ada Palmer explains that Renaissance art was not a luxury made possible by military surplus, but rather a strategic diplomatic tool cheaper than warfare. Rulers invested heavily in art, architecture, and cultural gifts to influence rivals like the King of France, similar to how modern diplomacy functions as a cost-effective alternative to military spending.
What sanctions are actually designed to do - Sarah Paine
Sarah Paine argues that sanctions function like economic chemotherapy — not to eliminate rogue states, but to suppress their growth over generations. Using North Korea as an example, she contends that the goal of geopolitical strategy is containment at acceptable cost, not total elimination of a threat.
The historical trap Putin can't escape - Sarah Paine
Sarah Paine argues that continental powers like Imperial China and Imperial Russia face catastrophic and irreversible consequences when they botch strategy. She uses the Bolshevik Revolution and its aftermath as a case study in how entire social classes and civilizations can be permanently erased. Continental powers, unlike maritime ones, operate without insurance policies.