NEW Gemini Features Explained — How to Use Google’s Latest AI Upgrade
The video reviews Google's Gemini 3.5 Flash model across multiple use cases including multimodal vision, video analysis, document processing, and agentic workflows. The presenter argues Flash delivers Pro-tier performance at a significantly lower price point than competitors like Claude and GPT. However, the review also surfaces three under-reported weaknesses: degraded long-context retrieval, increased verbosity, and a silent default thinking level downgrade.
Summary
The presenter has spent a week testing the new Gemini 3.5 Flash model across the Gemini app, AI Studio, Workspace extensions, and native video drop, concluding that Flash has quietly surpassed Pro-tier models in practical performance. The review is structured around a series of live demos.
In the multimodal vision demo, Flash correctly identified two partially obscured jars in a fridge photo and generated a complete recipe with a shopping list limited strictly to missing ingredients — a task the presenter notes most models fail by either missing obscured items or hallucinating ones that aren't there.
The native video understanding demo involved dropping a long video directly into chat and requesting timestamped insights plus a Python chart reconstructed from a data table at the 23-minute mark. Flash returned accurate timestamps (verified within 20 seconds) and rendered the Python chart inline, all within a single 64K-token response without truncation.
For long document analysis, the presenter tested a 40-page B2B contract PDF in AI Studio, demonstrating the practical difference between the low and high thinking level settings. The high thinking mode caught two penalty clauses and an auto-renewal trigger that the low thinking mode missed entirely, leading the presenter to recommend always using high thinking when the cost of a wrong answer is significant.
The vibe coding demo showed Flash generating a complete React and Tailwind component from a hand-drawn photo of an app layout, streaming hundreds of lines without truncation and rendering live inside AI Studio's built-in preview panel.
Additional demos covered voice memo structuring, error message diagnosis from screenshots, and JSON structured output extraction from 15 multilingual receipt photos — all without writing any API code.
The most significant demo was the Workspace agentic chain, where a single prompt caused Gemini to find a file in Google Drive, create a new summary document in Google Docs, and draft a Gmail message with a link — chaining three Google products autonomously.
On pricing, Flash costs $1.50 per million input tokens and $9 per million output tokens, compared to $5/$25 for Claude Opus 4.7 and $5/$30 for GPT 5.5, placing it in a fundamentally different cost bracket.
The presenter closes with three issues Google is not publicizing: Flash scores 7.6 points lower than Gemini 3.1 Pro on the MRCR V2 long-context retrieval benchmark; outputs are roughly twice as verbose on reasoning-heavy tasks; and the default thinking level was silently downgraded from high to medium when users migrated from 2.5 Pro, with no announcement from Google.
Key Insights
- The presenter found that Flash's high thinking mode caught two penalty clauses and an auto-renewal trigger in a contract PDF that the low thinking mode missed entirely, arguing this represents a meaningful — not marginal — quality difference for high-stakes documents.
- The presenter argues Flash is not marginally cheaper than competitors but operates in a fundamentally different cost bracket: $9 per million output tokens versus $25 for Claude Opus 4.7 and $30 for GPT 5.5, making the output token gap especially significant at production scale.
- Google silently changed the default thinking level in the Gemini app from high to medium when users migrated from 2.5 Pro to 3.5 Flash, without any public announcement — causing many users to experience degraded outputs without understanding why.
- Flash scores 7.6 points lower than Gemini 3.1 Pro on the MRCR V2 benchmark at 128K token context, meaning long-context retrieval accuracy actually regressed in the newer model despite other capability improvements.
- The presenter demonstrated that a single Gemini prompt using app-mention syntax chained Google Drive, Docs, and Gmail together autonomously — finding a file, creating a summary document, and drafting an email with a link — without the user touching any of those apps directly.
Topics
Full transcript available for MurmurCast members
Sign Up to Access