Sam Altman Just Beat Claude With OpenAI's Biggest Model Yet
OpenAI released GPT 5.5 and a desktop app called Codex that can perform real work across desktop applications. The creator tested GPT 5.5 against Claude Opus 4.7 on practical tasks and built a complete content management app overnight using only AI.
Summary
OpenAI launched GPT 5.5 alongside a desktop application called Codex that represents a significant advancement in AI capabilities. Codex introduces four major powers: building real files in Microsoft Office and Google Drive with working formulas, using actual desktop applications like Chrome and Slack without API connections, operating browsers independently to test user flows, and generating images while building functional apps in the same session. The creator conducted head-to-head tests between GPT 5.5 (via Codex) and Claude Opus 4.7 (via Claude Code) across four real-world tasks: analyzing YouTube videos, creating podcast clips, recreating Apple keynote slides as HTML presentations, and building a 3D UFO shooter game. In most tests, Codex significantly outperformed Claude Code, delivering more accurate, complete, and functional results. The most impressive demonstration was the overnight creation of 'Content OS,' a complete Mac application that manages content across Instagram, YouTube, X, LinkedIn, and newsletters. Built entirely by Codex in autonomous mode over nine hours, the app features live data from multiple APIs, content performance analytics, audience insights, and an AI-powered copilot for content strategy. GPT 5.5 represents OpenAI's biggest model improvement in over a year, with three key enhancements: reduced overthinking (using fewer tokens for the same tasks), dramatically improved long context handling (5x better on some tests), and superior multi-step task execution without human intervention. On professional knowledge benchmarks, GPT 5.5 achieved 84.9% on GDPWAL, the highest score ever recorded. While Claude Opus 4.7 still edges GPT 5.5 in pure code editing (64% vs 58%), GPT 5.5 excels at messy, multi-tool work that requires system-wide thinking and context retention.
Key Insights
- OpenAI's Codex can now use desktop applications directly without API connections, operating them the way a human would rather than requiring technical integrations
- The creator built a complete content management application overnight using only AI, managing 5 million followers across five platforms without writing any code
- GPT 5.5 achieved 84.9% on professional knowledge benchmarks, representing the highest score any AI model has ever recorded on that test
- While Claude Opus 4.7 still outperforms GPT 5.5 in isolated code editing tasks, GPT 5.5 excels at complex multi-step work that requires maintaining context across entire systems
- The biggest advancement in GPT 5.5 is its ability to perform multi-step tasks autonomously without constant human supervision, making independent decisions and continuing work when faced with ambiguous situations
Topics
Full transcript available for MurmurCast members
Sign Up to Access