Stop Asking Which Agent Is Best. Ask These 5 Questions Instead.
The speaker argues that teams should stop evaluating AI agents by benchmarks and instead use a five-question infrastructure filter. The key insight is that the agent market is shifting from model quality competition to infrastructure layering, and the most valuable launches are those that plug into existing tools, expose data, and allow other agents to build on top of them.
Summary
The video opens by acknowledging the overwhelming pace of AI agent launches — from OpenAI's workspace agents to Salesforce's Headless 360 to Kimi K 2.6 — and notes that the dominant reaction among team leaders is exhaustion rather than excitement. The speaker argues that the real question is not which agent is best, but which launches deserve a team's attention. The answer, he proposes, is a five-question infrastructure filter.
The five questions are: (1) Does it plug into tools the team already uses, or does it require migration to a new environment? (2) Does it let other agents build on top of it, or is it a closed product? (3) Does it own or access data the team actually cares about? (4) Is there a real ecosystem forming around it — SDKs, marketplaces, consistent shipping? (5) Can agents be stacked on top of it? The speaker argues that most launches fail these tests, and the ones that pass are worth a deeper look.
The speaker then runs five current launches through the filter. ChatGPT Workspace Agents represent a shift from personal assistant to shared team work units — recurring, schedulable, cross-tool workflows run from ChatGPT or Slack. They pass the filter for that specific use case but lose their edge when work is deeply native to Salesforce or Microsoft 365, or when the primary need is frontier coding.
Salesforce Headless 360, described as the most underrated launch, exposes the entire Salesforce platform as APIs, MCP tools, and CLI commands — meaning agents can now reach into Salesforce directly without a human clicking through the interface. With 60+ new MCP tools, a developer marketplace, and support for Claude Code, Cursor, and Codex, Salesforce is positioning itself as infrastructure under the agent economy. It scores high on all five filter questions. A notable detail: AgentForce 5 uses Claude Sonnet 4.5 as its default coding model, reflecting Anthropic's strategy of embedding into other companies' stacks.
Microsoft Copilot Wave 3's key components — Copilot Co-work and Work IQ — bring long-running multi-step agent execution and deep organizational graph access (email, meetings, files, SharePoint, identity) into Microsoft 365. Built in collaboration with Anthropic, it is strongest for Microsoft-native enterprises but weaker on openness to external agents and ecosystem energy, and largely irrelevant for engineering-heavy workflows.
Kimi K 2.6 is technically impressive — an open-weights multimodal agentic model with a 300-agent swarm architecture capable of 4,000-step execution — but fails the infrastructure filter for most enterprise teams. It doesn't own enterprise workflow data, lacks Western connector stories, and is best suited for dev teams capable of self-hosting and building their own agent infrastructure. The speaker explicitly warns against using hosted Kimi products with sensitive company data.
Perplexity Personal Computer's Mac rollout adds local file editing, local browsing, voice orchestration, and background task execution, with Claude Opus 4.7 as the default orchestrator. It passes the filter for research-heavy work that produces deliverables — competitive intelligence, market research, document review — but fails for shared recurring team processes that need governance and repeatability.
The speaker then addresses the 'when should I switch?' question directly, arguing it is the wrong frame. The market is moving toward layering, not switching. Claude, for example, now appears in three forms: as a direct product, embedded inside other vendors' products (Copilot, Perplexity, Salesforce), and as managed agent infrastructure via Anthropic's managed agents offering. The right questions are: when to stay in a direct model product, when to use a wrapper that provides data access you can't replicate yourself, and when to use a different underlying model because the surrounding product matters more than marginal model quality differences.
The video closes with a call to build routing judgment — matching the shape of the work to the shape of the tool — rather than chasing the loudest launch. Teams that layer deliberately will compound faster than those that chase benchmarks.
Key Insights
- The speaker argues that the most important agent launches are not those with the best benchmarks or loudest demos, but those that change what existing tools can reach and how easily agent systems can be stacked together — framing the shift as one from model quality to infrastructure.
- The speaker contends that Salesforce Headless 360 scores higher on the infrastructure filter than any other launch covered, because it exposes the entire Salesforce platform as APIs and MCP tools, transforming Salesforce from a destination into infrastructure that any compatible agent — Claude Code, Cursor, Codex — can act inside.
- The speaker identifies a pattern in Anthropic's enterprise strategy: Claude is no longer primarily a standalone chat product but is increasingly embedded as the default model layer inside other companies' products — including Salesforce AgentForce 5 (Claude Sonnet 4.5), Perplexity Computer (Claude Opus 4.7), and Microsoft Copilot Co-work.
- The speaker argues that Kimi K 2.6, despite its technically impressive 300-agent swarm architecture and open-weights license, fails the infrastructure filter for most enterprise teams because it doesn't own workflow data, lacks Western enterprise connectors, and is only genuinely valuable for dev teams capable of self-hosting — warning that for hosted use cases, the deciding variable is trust and data governance, not benchmark scores.
- The speaker reframes the common 'when should I switch?' question as fundamentally wrong, arguing instead that the agent market is moving toward layering — where the right question is which wrapper around a given model fits the job, based on data access, workflow integration, permissions, and ecosystem fit rather than model-vs-model comparisons.
Topics
Full transcript available for MurmurCast members
Sign Up to Access