InsightfulTechnical

Yao Shunyu: Let Me Go a Little Crazy! Training Models at Anthropic & Gemini, Heroism Is Over

Zhang Xiaojun Podcast

Yao Shunyu, a researcher who moved from Anthropic to Google DeepMind, discusses the current state of AI model development, the competitive landscape between major AI labs, and his personal journey from theoretical physics to AI research. He shares candid views on why individual heroism has ended in AI, the importance of reliability over brilliance, and his technical perspectives on pre-training, post-training, and long-horizon tasks.

Summary

The interview features Yao Shunyu, a researcher at Google DeepMind who previously worked at Anthropic, discussing the current AI landscape and his personal career journey. He begins by clarifying the difference between himself and the other famous Yao Shunyu (now at Tencent), noting that his own background is in theoretical physics rather than computer science.

On the state of AI models, Yao argues that the major labs (Anthropic, OpenAI, Gemini) have largely converged in capabilities, with benchmark differences now representing mostly noise rather than signal. He observes that the harder problem has shifted from 'can AI do this?' to 'what should we actually build?' Claude maintains an edge in agentic tool use, Gemini in pure reasoning, while coding remains competitive across all three.

Regarding the AI startup ecosystem, he discusses how wrappers like Manus and OpenClaw ultimately sold to larger companies because model moats remain dominant. He identifies two survival strategies: grow fast enough to build user mindshare before model companies copy you (Cursor's approach, though increasingly precarious), or stay small enough that big companies won't bother competing (Midjourney's approach). He describes the Cursor-Anthropic relationship as having entered a 'delicate competitive phase' now that Claude Code has launched.

On technical progress, Yao pushes back against the narrative that scaling laws have plateaued, attributing apparent plateaus mostly to bugs or flawed experimental assumptions rather than fundamental limits. He emphasizes that pre-training has continued to improve in recent months and that the primary drivers remain compute and data. He predicts that within 6-12 months, AI will begin completing full research cycles autonomously.

His career narrative traces from condensed matter physics at Tsinghua (where he co-discovered non-Hermitian skin effects), to theoretical high-energy physics at Stanford, to a brief Berkeley postdoc, then to Anthropic's reinforcement learning team where he worked on scaling post-training for what became Claude 3.5 new and 3.7. He joined Gemini in late September 2024 partly due to disagreement with Dario Amodei's anti-China stance (which he attributes roughly 40% weight), but primarily to broaden his learning beyond Anthropic's focused coding/agentic scope.

He reflects that the era of individual heroism in AI has passed—the Transformer moment was the last true heroic discovery, and now progress is fundamentally collective. He argues AI is 'essentially simple' compared to physics because every experiment is runnable and there's no fundamental energy-scale barrier to understanding. The most important trait in the field, he claims, is reliability and responsibility rather than brilliance.

For the future, he highlights two key research directions: ML coding (enabling AI to run complete research cycles) and long-horizon context (training with finite context but operating with effectively infinite context). He is skeptical of the chatbot as the ultimate AI interface, suggesting the form factor needs a product manager to unlock the model's true capabilities.

Key Insights

  • Yao argues that the most important trait in AI research is reliability and being detail-oriented, not intelligence — claiming the field 'doesn't really require much brains' and that 'doing simple things cleaner than anyone else is the most critical thing,' because anyone can think of the ideas but few can execute them stably.
  • Yao claims that the majority of apparent scaling law plateaus he has observed in the industry are caused by bugs or flawed experimental assumptions rather than fundamental limits, stating 'the vast majority of people who hit a wall, it's because of the third reason — there's a bug,' and that fixing a single bug often brings more progress than any fancy technique.
  • Yao describes Anthropic's key organizational advantage as having its top technical decision-makers (Jared Kaplan, Sam McCandlish) also be cofounders with full authority, enabling fast top-down bets — something he says OpenAI lost when Ilya departed and which Google DeepMind structurally cannot replicate as a large company.
  • Yao reveals he left Anthropic partly (roughly 40% weight) due to disagreement with Dario Amodei's anti-China stance, which he characterizes as 'a very emotional reaction' that was inappropriate for a company CEO to push to such an extreme, though he frames his primary motivation as wanting to learn different things like multimodal generation that Anthropic doesn't pursue.
  • Yao predicts that within 6-12 months AI will complete full autonomous research cycles — not just writing code but also running experiments, analyzing results, forming new hypotheses, and designing follow-up experiments — describing this chain as 'the next thing to gradually become complete' and noting it is already partially happening.

Topics

AI model capability convergence and benchmarksWrapper startups vs model company moatsScaling laws and pre-training progressAnthropic vs Google DeepMind organizational culturePost-training reinforcement learningLong-horizon AI tasksCareer transition from physics to AIChinese vs US AI development gapIndividual heroism ending in AI eraOpenClaw and Manus analysis

Full transcript available for MurmurCast members

Sign Up to Access

Get AI summaries like this delivered to your inbox daily

Get AI summaries delivered to your inbox

MurmurCast summarizes your YouTube channels, podcasts, and newsletters into one daily email digest.