Building Realistic Voice Agents Has Never Been Easier
The video demonstrates how to build a voice agent using Claude Code and ElevenLabs without manually configuring APIs or documentation. The creator walks through building a sales voice agent that integrates with Cal.com for booking, iterating through several rounds of debugging. The main argument is that natural language prompting via Claude Code dramatically reduces the complexity of building production-ready voice agents.
Summary
The video opens with the creator showcasing a previously built voice agent trained on 400 YouTube transcripts, demonstrating how users can query it conversationally about tools like Firecrawl and Claude Code workflows. This serves as proof-of-concept that complex voice agents can be built quickly using Claude Code.
The creator then explains the core architecture of any voice agent, identifying four key components: persona (system prompt), voice, knowledge base, and tools. He illustrates these using the ElevenLabs dashboard, showing how each piece is normally configured manually — a process he argues is error-prone and time-consuming.
The live demo centers on building a sales voice agent for a fictional AI consultancy called 'Neural.' The agent's goal is to qualify inbound visitors and book discovery calls via Cal.com. Using Claude Code's 'plan mode,' the creator dictates his requirements in natural language, and Claude asks clarifying questions about ElevenLabs setup, Cal.com configuration, voice persona, widget appearance, and data fields to capture. Claude then generates a complete architectural plan before writing any code.
After providing API keys for both ElevenLabs and Cal.com, Claude autonomously creates the ElevenLabs agent, configures two tools (check availability and book appointment), writes the system prompt, and embeds the widget into the local website. The first test reveals issues: the voice was too enthusiastic, the greeting message wasn't triggering automatically, and the agent was querying Cal.com availability in UTC rather than Central time, causing incorrect slot reporting.
Through iterative natural language feedback to Claude Code, the creator fixes the voice selection, adjusts the system prompt for conciseness and proper email/name spelling confirmation, and resolves the timezone bug in the tool call parameters. A final successful demo shows the agent correctly identifying available slots and booking a 7:00 p.m. Central appointment, with a confirmation email delivered to the correct address.
The creator closes by addressing security and cost concerns: locking the widget to specific domains to prevent credential theft, setting conversation duration caps, implementing rate limits, and considering authentication for public-facing widgets. He also notes the agent could be connected to a phone number via Twilio for the same functionality through a different channel.
Key Insights
- The creator argues that Claude Code's ability to read API documentation and reason about tool integration means users never need to manually inspect endpoints or configure platform dashboards — natural language description of the goal is sufficient to produce working integrations.
- The creator found that the timezone bug causing incorrect availability results was discovered by Claude Code analyzing the conversation transcript, which showed the check-availability tool was constructing its query window in UTC instead of Central time — a subtle error that would be difficult to spot without reading raw API parameters.
- The creator warns that embedding an ElevenLabs widget on a public website means all usage costs fall on the account owner, and that the widget's HTML snippet can be easily stolen via browser inspection and reused on another domain, making domain allowlisting in ElevenLabs security settings a critical safeguard.
- The creator explains that the same configured ElevenLabs agent — with its tools, prompting, and voice — can be surfaced through multiple channels (website widget, phone number via Twilio) without any additional configuration, calling it 'the same engine behind the scenes, just a different door.'
- The creator observes that latency in voice agents is directly tied to model and voice quality choices, and notes that testing on localhost produces worse latency than a live deployed widget, which can cause misleading impressions during development.
Topics
Full transcript available for MurmurCast members
Sign Up to Access