Claude Sonnet 5 is HERE!
Claude Sonnet 5 has been released as Anthropic's most agentic model, but the speaker argues it's a disappointing release that underperforms Opus 4.8 while being more expensive, making it an unattractive option for most users. The reviewer demonstrates this through benchmark comparisons and test outputs, concluding that users should stick with Opus 4.8 or wait for the incoming Fable 5 model.
Summary
The transcript covers a critical review of Claude Sonnet 5, which Anthropic announced as their most agentic model to date. The speaker begins by acknowledging the release but immediately tempers expectations, noting that while Sonnet 5 represents an improvement over Sonnet 4.6, it significantly underperforms Opus 4.8 across nearly every benchmark. The agentic coding score is 63% for Sonnet 5 versus 69% for Opus 4.8, with Opus outperforming on "literally every single benchmark."
The speaker then showcases practical examples of Sonnet 5's capabilities through Goldy Bench tests, including a ray caster maze, orbit simulation, synthwave background, and crypt game. While some outputs are aesthetically pleasing and functional, others are completely broken—notably the orbit test failing entirely. When compared side-by-side with GLM 5.2, Sonnet 5 shows mixed results: it creates smoother maze graphics but fails on the orbit task where GLM 5.2 succeeds.
A critical issue highlighted is the pricing problem: Sonnet 5 costs 1.2x more than Opus 4.8, making it objectively worse value. Multiple tweets from industry figures (Lisa, Bridge Mind) are quoted criticizing Anthropic's token efficiency and questioning the release decision. The speaker compares Sonnet 5 unfavorably to other models like Sekana Fugu Ultra, which produces noticeably superior visual quality in test outputs.
The speaker's conclusion is unambiguous: there is no compelling reason to use Sonnet 5 over Opus 4.8, and Anthropic's honest benchmarking (rather than inflating numbers) is appreciated but doesn't change the underlying problem. The speaker anticipates Fable 5 will overshadow this release entirely. The final recommendation pivots to a systems-based approach rather than chasing individual model releases—building flexible agent systems that can swap models as needed rather than depending on any single "hot model."
Key Insights
- Opus 4.8 outperforms Claude Sonnet 5 on nearly every benchmark, with 69% versus 63% on agentic coding, despite Sonnet 5 being more expensive at 1.2x the cost of Opus 4.8
- Sonnet 5 demonstrates inconsistent performance across different tasks—succeeding on the maze creation but completely failing on the orbit simulation test, whereas GLM 5.2 achieved the opposite results
- Multiple industry figures criticized the Sonnet 5 release as fundamentally flawed because the whole point of using Sonnet is that it should be faster and cheaper, but Sonnet 5 violates this by being more expensive than the superior Opus 4.8
- When comparing visual quality outputs, Sonnet 5 produces darker, less interesting results than Sekana Fugu Ultra and creates noticeably worse liquid simulation graphics compared to competing models
- The speaker recommends building flexible systems with pluggable models rather than optimizing for any single model release, so that regardless of which model performs best, the underlying architecture remains valuable
Topics
Transcript
[0:00] So, today we have the release of Claude Sonnet 5, apparently the most agentic model from Claude and Anthropic yet. And apparently, you can see the announcement here, just dropped a few hours ago. It can make plans, use tools like browser terminals, run autonomously at a level just a few months ago required larger and more expensive models. We'll come on to this in a minute. I'm not going to hype it up here because you'll see from my test. I'm just going to tell you the honest [0:30] truth. So, here you can see you got Sonnet 5, you got Sonnet 4.6. So, it is a step up from Sonnet 4.6. If you actually look, like Sonnet…
Full transcript available for MurmurCast members
Sign Up to AccessMore from Julian Goldie SEO
Claude Sonnet 5 VS GLM 5.2: Who Wins?
A detailed comparison of Claude Sonnet 5 versus GLM 5.2 AI models across game development, coding benchmarks, and UI creation tasks. The reviewer concludes that GLM 5.2 generally outperforms Sonnet 5 while being significantly cheaper, though Opus 4.8 and the forthcoming Fable 5 remain superior options.
Gamma Just Got Better With ChatGPT
Gamma, an AI design tool used by nearly 100 million people, is now integrated into ChatGPT as a native app, allowing users to create professional presentations, documents, and web pages without leaving the chat. The integration enables users to transform rough notes, training documents, and ideas into polished decks by simply conversing with ChatGPT, which handles the writing while Gamma handles the design.
NEW Qwythos 9B Runs Locally for FREE
Julian Goldie demonstrates how to run Qwythos 9B, a free 5.6GB local AI model on your Mac using Ollama, which can be integrated into an agentic operating system for private, offline AI tasks. While smaller than frontier models, it can effectively write, reason, and build applications locally without cloud connectivity or token costs.
GLM 5.2 + Claude Code is INSANE!
The speaker demonstrates how to integrate GLM 5.2 into Claude Code using Ollama to create a cost-effective alternative AI development setup. This system combines Claude Code's agentic capabilities with GLM 5.2's brain, syncs with Obsidian for memory management, and enables building apps, games, and websites while maintaining a fraction of the cost of standard Claude subscriptions.
This NEW AI AGENT is INSANE! 🤯
Ornith 1.0, a new open-source AI agent from Deep Reinforce, has achieved a score of 82.4 on SWE-Bench, surpassing Claude Opus 4.7. The model introduces self-scaffolding reinforcement learning, allowing it to build its own problem-solving framework without human-built instructions, and is available in four versions ranging from 9B to 397B parameters.