Codex Just 10x’d Claude Code Projects
OpenAI released an official Codex plugin for Claude Code that allows developers to use GPT-4o for code reviews within their Claude workflows. The speaker tested both tools and found they complement each other well - Claude Code excels at planning and creative outputs while Codex is better at code reviews and catching edge cases.
Summary
OpenAI has released an official Codex plugin for Claude Code, making it easier for developers to incorporate GPT-4o into their existing Claude workflows for code reviews and additional oversight. While people were already combining these tools, this plugin streamlines the process and can be used for free with a ChatGPT subscription. The speaker analyzed benchmarks comparing Claude Opus 3.5 with GPT-4o, finding that while Opus leads slightly in one benchmark (SWE-bench verified), GPT-4o outperforms Opus across most other coding benchmarks by significant margins (10-13 points in some cases) while being more cost-effective. Through research on social platforms, the speaker identified complementary strengths: Claude Code tends to over-engineer, be token-hungry, and miss edge cases in long runs, while Codex struggles with planning, asking good questions, and creative outputs. This makes them natural complements - many users plan and build initial versions with Claude Code, then use Codex for execution, production deployment, and reviews. The speaker conducted a practical test by giving both tools identical prompts to build a dungeon crawler game. Codex took longer but produced a more polished, less pixelated result that felt more like a professional application. The speaker then used Codex's adversarial review feature to analyze the Claude-built game, which identified critical bugs including a soft-lock scenario where players could become permanently stuck and data loss issues. After implementing Codex's suggested fixes in the Claude-built game, the functionality improved significantly. The plugin offers various functions including standard reviews, adversarial reviews, and rescue operations, essentially acting like additional skills that can run in the background.
Key Insights
- GPT-4o outperforms Claude Opus 3.5 across most coding benchmarks by significant margins (10-13 points in some cases) while being more cost-effective than Opus
- Claude Code's weaknesses include over-engineering, being token-hungry, and missing edge cases in long runs, while Codex struggles with planning, asking good questions, and creative outputs
- Many developers use a hybrid approach where they plan and build initial versions with Claude Code, then bring in Codex to execute the rest and push to production
- In a direct comparison building the same dungeon crawler game, Codex produced a more polished, less pixelated result that felt more like a professional application, despite taking longer to complete
- Codex's adversarial review identified critical bugs in the Claude-built game including a soft-lock scenario where players could become permanently stuck and data loss issues
Topics
Full transcript available for MurmurCast members
Sign Up to Access