Google DeepMind’s powerful AI co-mathematician
Google DeepMind released an AI co-mathematician built on Gemini 3.1 that uses agentic pipelines to assist researchers with unsolved math problems, achieving a 48% score on FrontierMath Tier 4. The newsletter also covers AI discoveries in exoplanet detection, practical AI use cases from staff, and various AI industry news. A key highlight is Oxford professor Marc Lackenby solving an open mathematical problem using a strategy found in a rejected AI output.
Summary
Google DeepMind published research on its AI co-mathematician, an agentic system based on Gemini 3.1 modeled after AI coding environments like Claude Code. The system uses a coordinator agent to break research into parallel workstreams, with sub-agents handling code writing, literature search, and proof attempts. It set a new high on Epoch AI's FrontierMath Tier 4 benchmark at 48%, more than doubling Gemini 3.1 Pro's raw score of 19%. Notably, Oxford professor Marc Lackenby used the system to resolve an open problem from the Kourovka Notebook after identifying a clever proof strategy buried within a proof the system's own reviewers had rejected.
The newsletter's Rundown Roundtable section featured staff AI use cases: a developer built an async Magic: The Gathering app using OpenAI Codex's /goal command, and a partnerships team member used Claude to plan an entire Greece itinerary, claiming it rivaled professional travel agents. An AI training guide detailed how to use Codex's Computer Use plugin to automate repetitive local tasks like Photoshop exports and file renaming.
In astronomy, University of Warwick researchers confirmed 100+ exoplanets using an AI system called RAVEN, which scanned 4 years of NASA TESS data covering 2.2 million stars. RAVEN also identified 2,000+ additional candidates, including 31 never-before-spotted exoplanets and planets in the 'Neptunian Desert' — a region where Neptune-sized planets were thought unable to survive. The system achieves 10x greater precision in measuring planet-type frequency compared to previous methods.
Additional news covered includes Google's Isomorphic Labs reportedly raising $2B+ for its Drug Design Engine, Greece proposing AI protections in its constitution, Baidu releasing ERNIE 5.1 at 6% of rival training costs, and OpenRouter launching Pareto Code for cost-optimized AI routing. A reader submission highlighted using ChatGPT to train four dogs, avoiding thousands in professional trainer costs.
Key Insights
- Oxford professor Marc Lackenby solved an open problem in the Kourovka Notebook not from a successful AI output, but by extracting a proof strategy from a proof the AI's own review system had rejected — suggesting value exists even in AI failures.
- DeepMind's co-mathematician achieved 48% on FrontierMath Tier 4 by adopting the agentic pipeline architecture used in AI coding environments, more than doubling the raw model score of 19%, indicating that architectural design matters as much as raw model capability.
- RAVEN's exoplanet detection achieves 10x greater precision in measuring planet-type frequency using smarter AI integration alone — not new telescope hardware — implying existing astronomical datasets contain far more discoverable knowledge than previously extracted.
- The newsletter frames the co-mathematician's value as augmenting expert researchers rather than replacing them, pointing to Lackenby's discovery as evidence that the most significant near-term AI math contribution may be accelerating human insight rather than autonomous problem-solving.
- Baidu claims ERNIE 5.1 cost just 6% as much to train as rival models while ranking No. 4 on Arena's Search Leaderboard, suggesting that training efficiency gaps between frontier labs and challengers are narrowing significantly.
Topics
Full transcript available for MurmurCast members
Sign Up to Access