Google DeepMind’s powerful AI co-mathematician
Google DeepMind released an AI co-mathematician built on Gemini 3.1 that uses agentic pipelines to assist researchers with unsolved math problems, achieving a 48% score on FrontierMath Tier 4. The newsletter also covers AI discoveries in exoplanet detection, practical AI use cases from staff, and various AI industry news. A key highlight is Oxford professor Marc Lackenby solving an open mathematical problem using a strategy found in a rejected AI output.
Summary
Google DeepMind published research on its AI co-mathematician, an agentic system based on Gemini 3.1 modeled after AI coding environments like Claude Code. The system uses a coordinator agent to break research into parallel workstreams, with sub-agents handling code writing, literature search, and proof attempts. It set a new high on Epoch AI's FrontierMath Tier 4 benchmark at 48%, more than doubling Gemini 3.1 Pro's raw score of 19%. Notably, Oxford professor Marc Lackenby used the system to resolve an open problem from the Kourovka Notebook after identifying a clever proof strategy buried within a proof the system's own reviewers had rejected.
The newsletter's Rundown Roundtable section featured staff AI use cases: a developer built an async Magic: The Gathering app using OpenAI Codex's /goal command, and a partnerships team member used Claude to plan an entire Greece itinerary, claiming it rivaled professional travel agents. An AI training guide detailed how to use Codex's Computer Use plugin to automate repetitive local tasks like Photoshop exports and file renaming.
In astronomy, University of Warwick researchers confirmed 100+ exoplanets using an AI system called RAVEN, which scanned 4 years of NASA TESS data covering 2.2 million stars. RAVEN also identified 2,000+ additional candidates, including 31 never-before-spotted exoplanets and planets in the 'Neptunian Desert' — a region where Neptune-sized planets were thought unable to survive. The system achieves 10x greater precision in measuring planet-type frequency compared to previous methods.
Additional news covered includes Google's Isomorphic Labs reportedly raising $2B+ for its Drug Design Engine, Greece proposing AI protections in its constitution, Baidu releasing ERNIE 5.1 at 6% of rival training costs, and OpenRouter launching Pareto Code for cost-optimized AI routing. A reader submission highlighted using ChatGPT to train four dogs, avoiding thousands in professional trainer costs.
About this episode
PLUS: Automate any manual task with Codex
Key Insights
- Oxford professor Marc Lackenby solved an open problem in the Kourovka Notebook not from a successful AI output, but by extracting a proof strategy from a proof the AI's own review system had rejected — suggesting value exists even in AI failures.
- DeepMind's co-mathematician achieved 48% on FrontierMath Tier 4 by adopting the agentic pipeline architecture used in AI coding environments, more than doubling the raw model score of 19%, indicating that architectural design matters as much as raw model capability.
- RAVEN's exoplanet detection achieves 10x greater precision in measuring planet-type frequency using smarter AI integration alone — not new telescope hardware — implying existing astronomical datasets contain far more discoverable knowledge than previously extracted.
- The newsletter frames the co-mathematician's value as augmenting expert researchers rather than replacing them, pointing to Lackenby's discovery as evidence that the most significant near-term AI math contribution may be accelerating human insight rather than autonomous problem-solving.
- Baidu claims ERNIE 5.1 cost just 6% as much to train as rival models while ranking No. 4 on Arena's Search Leaderboard, suggesting that training efficiency gaps between frontier labs and challengers are narrowing significantly.
Topics
Transcript
Good morning, {{ first_name | AI enthusiasts }}. Google DeepMind just took AI’s coding strategy and applied it to math: don't ask a model for the answer, give a team of agents the workspace. The company’s AI co-mathematician just scored a new high on a benchmark built to stump AI for decades, with one professor even cracking an unsolved problem using a strategy buried inside a proof the system's own reviewers had rejected. Google DeepMind’s AI co-mathematician The Rundown Roundtable: Our AI use cases Automate any manual task with Codex AI finds 100+ new exoplanets from NASA data 4 new AI tools, community workflows, and more GOOGLE DEEPMIND Image source: Pushmeet Kohli (@pushmeet on X) The Rundown: Google DeepMind just…
Full transcript available for MurmurCast members
Sign Up to AccessMore from The Rundown AI
Jeff Bezos' $41B 'artificial general engineer'
Jeff Bezos revealed more details about his AI startup Prometheus, which raised $12B at a $41B valuation with a goal of building an 'artificial general engineer' to accelerate physical product design. Anthropic faced backlash over its Fable model's invisible safety filters that downgraded answers without user notification. The 2026 FIFA World Cup debuted as the first AI-integrated tournament, with optical tracking, 3D body scans, and AI analytics wired into nearly every layer.
Anthropic writes Washington an AI regulation playbook
This newsletter covers Anthropic CEO Dario Amodei's new AI policy essay urging faster regulation, SpaceX's reveal of its orbital AI datacenter satellite AI1, and OpenAI's IPO plans tied to self-improving AI timelines. Additional stories include new AI tools, industry drama around model restrictions, and a community workflow from a teacher using AI to help refugees navigate legal documents.
Anthropic hands the public Mythos-class AI
Anthropic released Claude Fable 5, a restricted public version of its Mythos-class AI that tops nearly all major benchmarks, with access limits and pricing changes coming June 22. The newsletter also covers a Perplexity/Harvard study on AI agents shifting knowledge work patterns, and profiles a self-taught Japanese farmer using AI to build his own farm automation systems.
Apple’s new Siri AI overhaul is here (sort of)
Apple unveiled its Siri AI overhaul at WWDC 2026, but analysts found it underwhelming compared to frontier models. OpenAI published a blog declaring a 'third phase' of AI development, while Argentina introduced legislation creating 'non-human corporations' run by AI systems.
Washington wants a piece of OpenAI
The Rundown newsletter covers the U.S. government's reported talks with OpenAI about taking a 1-5% equity stake to fund a public wealth fund for Americans. It also covers OpenAI's planned ChatGPT overhaul into an agentic 'superapp' centered on Codex, plus staff AI use cases and community workflows.