9 Codex Tips From the Codex Team
The AI Daily Brief covers three main stories: Cursor's launch of Composer 2.5 (a competitive coding model at 10x lower cost than rivals), Cloudflare's findings on Anthropic's Mythos security model, and Elon Musk losing his lawsuit against OpenAI. The main episode breaks down nine tips from OpenAI's Codex team member Jason Liu on maximizing Codex as a persistent work system rather than a simple chat interface.
Summary
The episode opens with three headline stories before diving into a practical guide on using Codex effectively.
The first headline covers Cursor's release of Composer 2.5, a significant upgrade to their in-house coding model built on Moonshot's Kimi 2.5 base with improved reinforcement learning. The model scores competitively against Claude Opus 4.7 and GPT-5.5 on key benchmarks (69.3% on Terminal Bench 2.0, 79.8% on SweeBench Multilingual) while costing half as much — 50 cents per million input tokens versus rivals. Cursor also claims 10x token efficiency, with benchmark runs costing under $1 per task compared to $5-$11 for competitors. Cursor is simultaneously training a new model from scratch on XAI's Colossus 2 cluster. The release is framed within the broader competitive squeeze Cursor faces from both model labs (entering the harness space) and harness labs (building their own models).
The second headline covers Cloudflare's review of Anthropic's Mythos model for security research. Cloudflare found Mythos represents a qualitative leap: unlike previous models that only detected individual bugs, Mythos can chain multiple vulnerabilities into functional exploits, behaving more like a senior security researcher. It can also test and refine exploits iteratively, making it far more useful than models that generate lists of unverified potential vulnerabilities.
The third headline covers the conclusion of Elon Musk's lawsuit against OpenAI and Sam Altman. The jury returned a unanimous verdict in just two hours, finding Musk's claims were barred by the statute of limitations — he had waited too long to file. The trial surfaced internal OpenAI history, including a 2017 proposal by Musk to fold OpenAI into Tesla and a 2018 term sheet describing the for-profit structure Musk later claimed was illegitimate.
The main episode extracts nine tips from Codex team member Jason Liu's 'Codex Maxing' post. Tip 1 advocates using long-running, durable 'monothreads' per workstream, relying on Codex's improved context compaction to maintain continuity. Tip 2 champions voice input, arguing that rambling verbally gives the model access to the messy, uncertain version of one's thinking rather than a polished prompt, leading to better outputs. Tip 3 covers Codex's 'Steer' feature, which lets users inject feedback mid-task without stopping execution, enabling human-agent parallel work. Tip 4 is about externalizing memory into structured file systems (like an Obsidian vault synced to GitHub) so that insights from threads survive beyond any single conversation. Tip 5 covers tool use — computer use, browser use, and connectors — as the mechanism by which Codex becomes an evidence gatherer across live environments. Tip 6 addresses mobile and remote control, enabling users to steer long-running tasks without being at a desktop. Tip 7 introduces 'heartbeats,' scheduled or triggered check-ins that keep threads active and cross tool boundaries (e.g., checking Slack, re-rendering video, uploading via computer use). Tip 8 briefly touches on the 'slash goal' feature for projects with verifiable success criteria, noted as deserving its own dedicated episode. Tip 9 highlights the side panel as the space where Codex transitions from a chat app to a work environment, allowing artifact inspection and annotation without interrupting the agent's workflow.
Key Insights
- Jason Liu argues that voice input is valuable not just for speed but because it gives the model access to the 'messy version' of one's thinking — including uncertainty, trade-offs, and half-formed ideas — which leads to better outputs than polished typed prompts.
- The host frames Cursor's competitive challenge as a two-sided squeeze: model labs like Anthropic are building their own coding harnesses (Claude Code), while Cursor simultaneously can't afford to keep subsidizing Anthropic model costs — making building their own model an existential priority, not just a strategic one.
- Cloudflare's review found that what distinguishes Mythos from other models isn't bug detection (which many models can do) but the ability to synthesize multiple vulnerabilities into functional, iteratively refined exploits — a qualitative shift from automated scanner to senior researcher behavior.
- Jason Liu argues that native Codex memory features are insufficient for serious workflows, and that structured external file systems (like an Obsidian vault) are necessary because they force the agent to compress experience into inspectable, editable artifacts that survive thread death or compaction failures.
- The Musk vs. OpenAI trial was decided purely on technical grounds — the statute of limitations — without the jury ever considering the substantive merits of the breach of charitable trust claim, meaning the deep questions about OpenAI's for-profit conversion remain legally unresolved.
Topics
Full transcript available for MurmurCast members
Sign Up to Access