Gemini 3.1 Flash Live Just Changed Voice Agents Forever

Nate Herk | AI AutomationMarch 28, 202618m 42s

Google released Gemini 3.1 Flash Live, a new speech-to-speech voice AI model with improved latency, accuracy, and multimodal capabilities. The presenter demonstrates building voice agents using the model with Google AI Studio and shows how to integrate it with external tools using Claude for coding assistance.

Summary

The video covers Google's new Gemini 3.1 Flash Live voice model, which represents a significant upgrade from previous text-based processing to direct speech-to-speech communication. The model offers improved precision, lower latency, and more natural interactions, with benchmark improvements of 19% in multi-step function calling over previous Gemini models. Key features include enhanced performance in noisy environments, better accuracy with alphanumeric strings, and contextual awareness for understanding emotions like sarcasm or frustration. The presenter demonstrates the model's capabilities through Google AI Studio, showing how users can create custom voice agents with system instructions, voice options, and tool integrations. Two practical examples are showcased: a customer service agent for a keyboard website and a personal assistant that can access ClickUp tasks and calendar functions. The model supports over 70 languages and offers both free and paid tiers, with the free version allowing experimentation but having usage limits and data sharing with Google. While the technology is impressive, the presenter notes current limitations including synchronous function calling that creates awkward pauses, and the technical complexity of deploying to production environments compared to simpler solutions like 11Labs.

About this episode

Full courses + unlimited support: https://www.skool.com/ai-automation-society-plus/about All my FREE resources: https://www.skool.com/ai-automation-society/about Apply for my YT podcast: https://podcast.nateherk.com/apply Work with me: https://uppitai.com/ My Tools💻 14 day FREE n8n trial: https://n8n.partnerlinks.io/22crlu8afq5r Code NATEHERK to Self-Host Claude Code for 10% off (annual plan): https://www.hostinger.com/vps/claude-code-hosting Voice to text: https://ref.wisprflow.ai/nateherk Google just dropped Gemini 3.1 Flash Live, their new speech-to-speech voice model. In this video, I break down what makes it different, try it out for free in Google AI Studio, and then use Claude Code to build two working demos: a voice agent embedded on a website and a personal assistant that connects to my calendar and ClickUp. I also cover pricing, current limitations, and what it takes to actually deploy something like this. Sponsorship Inquiries: 📧 [email protected] TIMESTAMPS 0:00 Intro 1:01 What Is Gemini 3.1 Flash Live 3:14 Trying It Free in Google AI Studio 4:56 Custom Voice Agents 6:05 Webcam & Vision Demo 8:01 Function Calling & Tools 10:02 Building Two Apps With Claude Code 15:20 Pricing & Deployment 18:30 Final Thoughts

Key Insights

Gemini 3.1 Flash Live uses direct speech-to-speech processing instead of the traditional speech-to-text-to-speech pipeline, enabling more natural interactions and better contextual understanding of emotions like sarcasm
The new model outperformed previous Gemini models by 19% in multi-step function calling benchmarks and shows significant improvements in noisy environment performance
The model currently has a limitation where it stops speaking entirely during function calls and waits for responses, creating awkward silences unlike other voice agents that can talk while processing
Google offers the model free with data sharing for product improvement, or paid tiers starting around 14 cents for a 10-minute call with enterprise-grade privacy and higher rate limits
Deploying Gemini Live to production websites requires managing persistent websocket connections and server processes, making it more technically complex than plug-and-play solutions like 11Labs

Topics

Gemini 3.1 Flash Live voice modelSpeech-to-speech AI technologyVoice agent developmentGoogle AI StudioFunction calling and tool integration

Transcript

[0:00] Hey, so I think that these scissors are broken. These two different pieces came apart. Do you know how I might be able to fix this? >> I see. It looks like these are designed to come apart for cleaning. If you align the rivet on one blade with the slot on the other, they should slide back together securely. Can you give that a try? >> There we go. I think I got it working. >> Perfect. That's exactly how they connect. They look like they're working just fine now. Hey, so I was just scrolling through my school community and I was curious if you know you were [0:32] able to see these different posts and specifically…

Full transcript available for MurmurCast members

View original source →

More from Nate Herk | AI Automation

Get AI summaries like this delivered to your inbox daily

Gemini 3.1 Flash Live Just Changed Voice Agents Forever

Summary

About this episode

Key Insights

Topics

Transcript

More from Nate Herk | AI Automation

Fable 5 + Karpathy’s LLM Wiki is Basically Cheating

How Claude is Creating a New Generation of Millionaires

How Anthropic Engineers Actually Prompt Fable 5

Stanford's Method Turns Claude Into a PHD Level Research Team

Is Claude Mythos Coming?

Get AI summaries delivered to your inbox