NEW Hermes AI Voice Agent is INSANE!
The presenter demonstrates a voice-enabled AI agent called Hermes, powered by MiniMax's M3 model, built into an 'Agent Operating System.' The system allows users to control AI agents through natural voice conversation without coding, and supports image/video generation, live Twitter search, and multi-agent management. The video concludes with a pitch for the AI Profit Boardroom community where the full system is available.
Summary
The video showcases a voice agent called Hermes, built on MiniMax's M3 frontier model, embedded within a custom 'Agent Operating System' (Agent OS). The presenter demonstrates real-time voice interaction, showing the agent responding to jokes, teaching basic Japanese phrases, and discussing automation capabilities — all through spoken conversation rather than typed prompts.
A key selling point emphasized throughout is that the entire Agent OS, including the voice interface, was built by MiniMax/Hermes itself with no manual coding by the presenter. He simply described the features he wanted, and the AI built it out. This is framed as a major shift from the 'old way,' which required teams of developers and engineers.
The presenter contrasts Hermes with conventional AI tools like ChatGPT, arguing that ChatGPT is merely a chat interface, whereas Hermes functions as a true agent that can be controlled via voice. The system is described as running locally on the presenter's machine, giving it hands-free, real-time access to all configured agents simultaneously.
MiniMax M3 is highlighted for its million-token context window, native multimodality (text, images, cinematic video), strong coding ability, and plans to be released as open-source on Hugging Face for free local use. The presenter also notes that using Grok as the underlying model (via Open Claw or Hermes) adds live Twitter search capability, making it more powerful for research, while MiniMax M3 is preferred for generative media tasks.
Additional features demonstrated include switchable voice styles and accents (e.g., American, Manchester English, presenter mode), a workspace for storing and reviewing past sessions, and integration with tools like Obsidian and Notebook LM. The video ends with a promotional pitch for the AI Profit Boardroom, a paid community offering the full Agent OS zip file, tutorials, weekly coaching calls, and peer networking.
Key Insights
- The presenter claims the entire Agent Operating System, including the voice interface, was built by MiniMax/Hermes itself — he did not write any code himself, only described the features he wanted.
- The presenter argues that Hermes differs fundamentally from ChatGPT because ChatGPT is just a chat interface, whereas Hermes is a true agent that can be commanded and controlled through live voice conversation.
- The presenter states that MiniMax M3 is planned to be released as open-source on Hugging Face, meaning users will eventually be able to run it locally for free.
- The presenter notes that using Grok as the underlying model (via Open Claw or Hermes) provides live Twitter search capability, making it more powerful for research compared to MiniMax M3, which he prefers for generative video and image tasks.
- The presenter demonstrates that the Hermes voice agent can switch between distinct accents and speaking styles on command, including American, Manchester English, and a 'presenter mode,' illustrating voice persona flexibility.
Topics
Full transcript available for MurmurCast members
Sign Up to Access