AI News: The Scariest AI Model Ever!
This weekly AI news roundup covers Claude Mythos, an unreleased AI model from Anthropic that's so powerful at finding software vulnerabilities they won't release it publicly. Instead, they're giving limited access to major tech companies through Project Glass Wing to help patch vulnerabilities before similar models become widely available.
Summary
The video begins with extensive coverage of Claude Mythos and Project Glass Wing, which has dominated AI discussions this week. Mythos is Anthropic's most powerful unreleased AI model that demonstrates unprecedented coding capabilities, particularly in finding and exploiting software vulnerabilities. The model scored 83.1% on cybersecurity vulnerability reproduction benchmarks, significantly outperforming previous models, and found vulnerabilities in major operating systems and browsers, including a 27-year-old vulnerability in OpenBSD and a 16-year-old vulnerability in FFmpeg. Rather than releasing it publicly due to security concerns, Anthropic launched Project Glass Wing, giving selective access to major tech companies like Apple, Microsoft, and Nvidia so their cybersecurity teams can identify and patch vulnerabilities before similar powerful models become widely available.
The video then covers two major language model releases: Meta's Muse Spark from their new Super Intelligence Labs, which represents their first significant model since restructuring their AI team, and GLM 5.1 from ZAI, an open-source model under MIT license that achieves near state-of-the-art coding performance. Meta's model performs well across benchmarks, ranking fourth overall, while being highly token-efficient, though it doesn't lead in any particular category. The GLM 5.1 model is particularly noteworthy for matching GPT-4's coding performance while being completely open-source and available for download.
Google introduced new features for Gemini, including interactive simulations and models similar to OpenAI and Anthropic's recent releases, plus a new Notebooks feature that organizes chats and files like projects in other platforms. The video also covers the US rollout of Seed Dance 2.0 video model through Runway and CapCut, HeyGen's new Avatar 5 model requiring only 15 seconds of recording, and various other updates including OpenAI's new $100/month Pro tier, Anthropic's managed agents feature, and multiple smaller tool releases and features across the AI landscape.
Key Insights
- Anthropic built Claude Mythos as a general-purpose model for coding that accidentally became extremely powerful at cybersecurity, finding vulnerabilities in major operating systems without specific training for cyber tasks
- The speaker argues there's a pattern of companies claiming their models are 'too dangerous to release' for marketing purposes, comparing current Mythos headlines to similar claims about GPT-2 in 2019
- Meta's new Super Intelligence Labs produced their first model Muse Spark, which ranks fourth overall in AI benchmarks and is highly token-efficient, marking their first major release since restructuring their AI team
- GLM 5.1 achieves near state-of-the-art coding performance comparable to GPT-4 and Claude while being completely open-source under MIT license, which the speaker finds mind-blowing
- Anthropic announced they will no longer allow Claude subscriptions to cover usage in third-party tools like OpenClaw starting April 4th, likely because these tools burn through tokens faster than subscription revenue can cover
Topics
Full transcript available for MurmurCast members
Sign Up to Access