How to Run Hermes FREE Forever!
The video demonstrates how to run the Hermes AI agent for free using Gemma 4, a local open-source model from Google, with significant speed improvements through MLX optimization. The setup works on Apple Silicon Macs or via free APIs on Open Router, enabling autonomous agents to work offline and privately without subscription costs.
Summary
Julian presents a complete system for running Hermes, one of the most powerful AI agents, without ongoing costs. The core update involves Gemma 4, Google's free local model, which is now 90% faster when run through Olama on Apple Silicon using MLX technology. For those without Apple Silicon, Open Router provides free API access to the 31B Gemma model.
The key advantage of local models is privacy and offline capability—all work stays local without requiring internet connectivity or API subscriptions. Julian demonstrates practical applications including building a to-do list app, using the forward slash learn feature to add skills to Hermes, reading and analyzing emails, and monitoring AI news automatically.
Hermes now includes a sub-agents update that allows multiple Gemma 4 workers to handle parallel tasks simultaneously. The system uses agentic loops, where users set a goal and the agent autonomously completes it, checking its own work and retrying as needed, without expensive token consumption. Julian contrasts this with the old approach where users had to manually prompt the agent repeatedly.
Setup is straightforward: download the latest Olama, select Gemma 4, choose the new MLX version, and connect it to Hermes with a single command. Julian emphasizes that technical expertise isn't required and provides access to the AI Profit Boardroom community with 194 pages of user testimonials, full training courses, playbooks for token optimization, and weekly coaching calls.
Key Insights
- The new MLX-optimized version of Gemma 4 runs 90% faster than previous versions, making local model execution fast enough for practical use with Hermes, whereas it was previously too slow to be usable.
- Hermes agents can run autonomous loops where they set a goal, execute tasks, check their own work, and retry automatically without manual intervention or token cost concerns, fundamentally changing how agentic workflows operate compared to traditional prompting.
- The forward slash learn feature allows Hermes to read tutorials and documentation, then add that knowledge as a permanent skill that it never forgets, enabling continuous skill accumulation from local training materials.
- Open Router provides a free API alternative to local models for users without Apple Silicon, ensuring Gemma 4 deployment is accessible regardless of hardware limitations.
- The AI Profit Boardroom community contains 194 pages of documented testimonials from non-technical users successfully implementing agent operating systems, demonstrating that no technical expertise is required to set up and use these systems.
Topics
Transcript
[0:00] Today I'm going to show you how to run Hermes for free forever with a new update to Gemma 4 that actually makes it 90% faster. So this is a new update that just dropped for Gemma 4 with Olama on Apple Silicon using MLX. And so with MLX you can run models 95% faster using Gemma 4. Gemma 4 is a local free model from Google and you can now plug it into Hermes in like one single click and run [0:30] free models forever. So let me show you an example of this. We already plugged it into the agent OS over here and then if we go to the different profiles that we have. I usually…
Full transcript available for MurmurCast members
Sign Up to AccessMore from Julian Goldie SEO
This NEW Chinese AI is INSANE! (FREE + Open Source!)
Long Cap 2.0 is a new open-source Chinese AI model from a food delivery app company that offers 1 million tokens of free context memory, beats GPT-4.5 on SWE bench pro benchmarks, and uses efficient parameter activation to reduce computational overhead while maintaining high performance.
Claude Code is now FREE: Here’s how…
Google's new Gemma 4 model running on Ollama is 90% faster on Apple Silicon, enabling free Claude Code usage locally without token costs. The setup requires three simple steps: downloading Ollama, Gemma 4, and installing into Claude Code, with alternatives available via OpenRouter API for non-Mac users.
X AI MCP Server Just Changed AI Agents
X has launched a hosted MCP (Model Context Protocol) server that gives AI agents direct access to real-time data from X's platform through a standardized connection, eliminating the need for custom API integration work. The setup involves OAuth authentication, the XRL token manager, and access to 200+ X API tools for research, content creation, and trend tracking.
New NotebookLM Update is INSANE!
Google's NotebookLM now features short video overviews that convert documents into engaging 60-second vertical videos using the new Nano Banana 2 Light image model. The feature represents rapid iteration in AI tools and offers practical applications for students, creators, and businesses seeking to transform static documents into shareable video content.
How to Rank #1 with Claude Fable 5 AI SEO!
The speaker demonstrates how to use Claude Fable 5 AI for SEO automation to rank websites, showing real examples of sites growing from zero to hundreds of daily clicks. The strategy emphasizes using Fable 5 for planning and building automation systems, while deploying content creation with cheaper alternative models due to Fable 5's token limitations.