AI shows its skills in the emergency room
A Harvard study published in Science found that OpenAI's o1-preview model outperformed two attending ER physicians across 76 real emergency cases, correctly diagnosing patients 67.1% of the time versus 55.3% and 50.0% for the doctors. The newsletter also covers the Pentagon's addition of 8 AI companies to classified networks while excluding Anthropic, and shares staff and reader AI use cases ranging from medical research to real estate investment tools.
Summary
The newsletter opens with coverage of a Harvard study published in Science that tested OpenAI's o1-preview model against two attending emergency room physicians across 76 real ER cases. The AI outperformed both doctors at the initial triage stage, achieving a 67.1% correct diagnosis rate compared to 55.3% and 50.0% for the two physicians. Notably, physician reviewers scoring the diagnoses could not distinguish AI-generated outputs from human ones. In one highlighted case, the AI identified a rare flesh-eating infection in a transplant patient approximately 12–24 hours before the treating physician did. The newsletter frames this as evidence that AI is ready for a formal role alongside doctors, especially given that newer frontier models would likely perform even better.
The Rundown Roundtable section features staff members sharing personal AI use cases. One writer describes using Gemini and ChatGPT to navigate her daughter's rare autoimmune disease diagnosis, using AI to parse medical literature, identify leading specialists, and understand treatment options across countries. An editor shares how he uses ChatGPT to analyze packaged food labels, uploading product photos to flag unhealthy ingredients and decode technical food-industry jargon like INS numbers and emulsifiers.
The newsletter includes a tutorial on using Claude Design, Anthropic's new AI design tool, to generate high-converting landing page mockups. The guide walks through a multi-step process involving wireframe creation, screenshot references from high-traffic pages, and iterative refinement using comments — with a final option to hand off the design to Claude Code for deployment.
On the geopolitical and defense front, the newsletter reports that the Pentagon added 8 AI companies — including OpenAI, Google, SpaceX, Nvidia, Microsoft, AWS, Oracle, and Reflection — to its classified networks, while notably excluding Anthropic due to a standing supply-chain risk designation. The newsletter highlights the apparent contradiction in the White House simultaneously blacklisting Anthropic while seeking priority access to its Mythos model. It also flags Reflection's inclusion as notable, given its $2B funding from a Donald Trump Jr.-backed venture fund.
Additional news briefs cover OpenAI shipping Codex Pets (animated desktop companions for tracking agent tasks), Maryland passing the first U.S. ban on AI-driven personalized grocery pricing, SAG-AFTRA securing AI protections in a new studio deal, and a Chinese court ruling that replacing a worker with AI does not legally justify termination. A reader workflow from Finland describes using Gemini Pro and other free AI tools to build a real estate market analysis tool in a county with limited data transparency.
Key Insights
- The Harvard study found that OpenAI's o1-preview — a model already a generation behind current frontier models — outdiagnosed two attending ER physicians at triage, with the newsletter arguing this suggests even greater potential for newer AI in clinical settings.
- The newsletter highlights that physician reviewers scoring the ER cases could not tell which diagnoses came from the AI and which from humans, suggesting AI-generated medical reasoning has reached a level of qualitative parity with experienced doctors.
- The newsletter frames the Pentagon's simultaneous blacklisting of Anthropic and pursuit of access to its Mythos model as a political contradiction, suggesting the White House wants strategic AI leverage without granting Anthropic formal defense contractor status.
- A staff writer argues that AI tools like ChatGPT and Gemini proved valuable not by replacing doctors, but by helping a patient's family understand complex medical literature, evaluate treatment options across countries, and identify top specialists — describing it as a demystifying resource during a health crisis.
- The newsletter notes that Reflection, one of the eight companies added to Pentagon classified networks, raised $2 billion from a fund backed by Donald Trump Jr., raising implicit questions about the political dimensions of defense AI contracting decisions.
Topics
Full transcript available for MurmurCast members
Sign Up to Access