Claude Mythos 5: Most Powerful Model Ever! AGI, GLM 5.1, Claude Code Update & Codex Plugins! AI NEWS
This video covers major AI developments including a leaked 10 trillion parameter Claude Mythos model from Anthropic, GLM 5.1's open-source agentic capabilities, and various updates from Google DeepMind, OpenAI, and other AI companies. The speaker predicts white-collar job automation within 2 years and identifies 2026 as a pivotal year for AI advancement.
Summary
The video begins with discussion of a major leak from Anthropic revealing two upcoming models: Claude Mythos and Capabara, with Mythos representing a completely new tier above their current flagship Opus model. Early testers report the model is not comparable to current versions and may pose security risks, leading Anthropic to plan a slow rollout. The speaker believes this model will contribute to white-collar job automation within 2 years. GLM 5.1 was released as an open-source agentic model that scored 45.3 on coding benchmarks compared to Opus's 47.9, demonstrating how open-source models are closing the gap with proprietary ones. Google DeepMind introduced Gemini 3.1 Flash Live, a real-time multimodal model designed for voice and vision agents after over a year of development. OpenAI transformed Codex into a full plugin ecosystem with pre-built workflows, positioning it as a direct competitor to Claude Code. ARC AGI 3 was introduced as a new benchmark where current best models score under 1%, designed to test real intelligence rather than memorization. Various other updates include Claude Code's autofix feature and session limits, 11 Labs' agent-first CLI, Mistral's Boxtrol TTS model, and Cursor's Composer 2 controversy where users discovered it was actually a fine-tuned Kimik K 2.5 model rather than an original creation.
Key Insights
- The speaker believes white collar jobs will be automated in the next 2 years, with the leaked Claude Mythos model being one that will contribute to that automation
- The speaker argues that 2026 is going to be the year that everything changes in the AI space, with the gap between AI tools and AI systems collapsing really fast
- GLM 5.1 scored 45.3 versus 47.9 for Opus on coding benchmarks, demonstrating that open-source models are rapidly closing the gap with Frontier AI
- ARC AGI 3 represents a breakthrough in AI benchmarking because for the first time we have a benchmark that can properly track real progress and intelligence, not just memorization
- A user discovered that Cursor's Composer 2 was actually a fine-tuned Kimik K 2.5 model despite Cursor not specifying this and claiming it was their own frontier-level model
Topics
Full transcript available for MurmurCast members
Sign Up to Access