TechnicalInsightful

How to Use /Goal to Do More With AI

The episode introduces the /goal primitive in Codex and Claude Code, explaining how it shifts AI interaction from turn-based prompting to autonomous, self-evaluating loops. The host covers what makes a good goal, how to write one, and explores how knowledge workers—not just developers—might apply this feature across tasks like literature reviews, claim audits, and vendor evaluations.

Summary

The episode opens by contextualizing /goal within a broader trend toward reducing the turn-based interaction paradigm in AI. The host references a prior 'Codex Maxing' episode based on OpenAI's Jason Liu's techniques, which explored patterns like durable monothreads, voice input, and steering to make AI work more parallel and less dependent on constant human feedback. The /goal feature is presented as a natural evolution of these ideas—a primitive that allows users to define a finish-line outcome, after which the AI loops, self-evaluates, and stops only when the stated completion criteria are met.

The host traces the feature's emergence: Codex shipped it first, with team members like Tebow and Pavel Hurin hyping it as potentially the most consequential thing Codex had shipped. Andrej Karpathy's 'auto-research loop' and the earlier 'Ralph Wiggum loop' are cited as conceptual predecessors. Claude Code subsequently adopted the same feature and the same name (/goal), which the host frames as a mature recognition that participating in a new primitive is smarter than trying to own it. Microsoft's Nicholas Bustamante described the architecture as an initializer agent turning fuzzy intent into a structured plan.md, worker agents making bounded progress, and a judge agent evaluating whether the completion condition is genuinely met.

The host then defines what distinguishes a goal from a prompt: a goal is a 'finish line contract' specifying what should be true, how success is verified, and what must remain intact. Unlike a prompt that produces a single output for human review, a goal runs a continuous loop—working, checking evidence against the finish line, and deciding whether to continue, complete, or stop because no defensible path remains. The OpenAI guide identifies six components of a strong goal: the outcome, the verification surface, the constraints, the boundaries, the iteration policy, and the block-stop condition.

The episode addresses scope, noting a Goldilocks zone between goals that are too narrow (preventing discovery of upstream issues) and too broad (making it hard to define inspectable evidence of success). The quality of the output artifact is also flagged as critical—a vague artifact like 'write docs' produces weak evidence surfaces, while a specific artifact with defined structure and verifiable properties gives the AI something concrete to judge against.

The second half of the episode focuses on extending /goal beyond software engineering into knowledge work. The host argues that the key signal for a good goal candidate is whether the output is an audit rather than just an answer—a ledger of what was checked, supported, contradicted, and unknown. Ten knowledge work domains are identified as potential fits: literature reviews, market landscapes, vendor evaluations, due diligence, claim audits, policy research, interview synthesis, timeline reconstruction, spreadsheet audits, and strategy memos. Three are explored in depth with example goal prompts: claim audits (verifying memo claims against sources with a labeled evidence table), market landscapes (building a comparison table with confidence levels and evidence gaps), and literature reviews (producing a source matrix with confirmed themes, disputed findings, and open questions).

The host draws a distinction between externally definable rubrics and user-provided rubrics, arguing the latter will be more common in knowledge work—examples include hiring criteria, vendor scorecards, editorial standards, and investment diligence priorities. The episode closes by acknowledging that /goal is not always the right tool; many tasks are better served by traditional prompting, and the full spectrum of interaction autonomy remains relevant. The host encourages experimentation and promises a follow-up episode as more real-world non-coding use cases emerge.

Key Insights

  • The host argues that /goal is not a larger or better prompt but a fundamentally different interaction type—a 'finish line contract' that shifts the user's role from directing steps to defining completion conditions, after which the AI loops and self-evaluates without human steering.
  • Pavel Hurin and the Codex team framed the key skill in the /goal paradigm as 'engineering the intent'—articulating why the goal matters, what strategic context surrounds it, and how success will be measured, so the agent can make autonomous decisions with less guidance.
  • The host identifies a Goldilocks zone for goal scope: goals too narrow (e.g., 'fix this one line') prevent the AI from discovering upstream root causes, while goals too broad (e.g., 'improve the whole system') make it impossible to define the concrete evidence needed for the AI to self-judge completion.
  • The host argues that the strongest signal a knowledge work task is suited for /goal is when the desired output is an audit—a traceable ledger of what was checked, supported, contradicted, and unknown—rather than a single synthesized answer.
  • Claude Code adopting the /goal name rather than branding its own version is cited by the host as a strategically mature move, reflecting that participating in an emerging primitive across the ecosystem is more valuable than trying to own it through differentiation.

Topics

/goal primitive in Codex and Claude CodeShifting from turn-based prompting to autonomous AI loopsAnatomy of a well-formed goal promptApplying /goal to knowledge work beyond software engineeringUser-provided rubrics as success criteria

Full transcript available for MurmurCast members

Sign Up to Access

Get AI summaries like this delivered to your inbox daily

Get AI summaries delivered to your inbox

MurmurCast summarizes your YouTube channels, podcasts, and newsletters into one daily email digest.