TechnicalOpinion

I Rebuilt Hermes in Claude Code (It’s Ridiculously Good)

Simon Scrapes

The creator rebuilds key features of the Hermes agentic AI system inside their own Claude Code setup rather than installing it off-the-shelf, arguing that understanding the architecture provides more long-term leverage. They identify three hidden costs of pre-built systems: inherited assumptions, inability to fix unknown code, and poor scalability. Their custom build addresses Hermes's limitations in multi-client identity management, semantic memory recall, and skill system maintainability.

Summary

The video opens by noting Hermes reached 40,000 GitHub stars in 46 days — faster than OpenClaw — making it the fastest-adopted agentic system on GitHub. Rather than installing Hermes directly, the creator read through its GitHub issues first and decided to rebuild the desired features within their own Claude Code setup, prioritizing understanding over speed of adoption.

The creator outlines three hidden costs of off-the-shelf agentic systems. First, inherited assumptions: Hermes's self-learning loop has no external guardrails, meaning the same model that writes a skill also validates it, creating a self-validation problem with no version control or audit log. Second, inability to fix unknown code: OpenClaw, a comparable product, had over 200 vulnerabilities filed since February, including 386 malicious packages from a single threat actor on its skills marketplace — and users can't debug what they don't understand. Third, scalability problems: a non-technical CEO spent over 100 hours and $1,000 testing OpenClaw over two months before concluding its bugs and security gaps made it unusable for business.

On the identity layer, Hermes uses memory.md and user.md files injected at the start of every conversation, but assumes a single user working on a single set of projects. The creator's solution adds shared brand context per client — voice, ICP, visual identity — while allowing skills and procedures to be shared across clients from a single installation, avoiding the maintenance overhead of multiple separate Hermes installs.

On memory, Hermes autosaves and summarizes conversations, injects context capped at roughly 1,300 tokens, but relies on keyword-based long-term recall — a significant limitation when users can't remember exact words used in past sessions. The creator's system retains the injected recent-memory pattern but replaces keyword search with semantic search (via a system like MemSearch) for deeper recall, making long-term memory practically useful.

On the self-learning loop, the creator argues that Hermes's automatic skill-writing creates compounding maintenance problems: over time, similar tasks generate near-duplicate skills with overlapping descriptions, making it unclear which skill to use and requiring updates in multiple places when context changes. Their alternative is a modular 'skill systems' architecture where each skill does one job, lives in one place, and is referenced by skill systems that chain components together. When brand voice or client positioning changes, only one file needs updating, and all dependent skill systems reflect the change automatically.

The video concludes that Hermes is faster to start but custom builds are faster to scale. The creator positions their 'Agentic OS' — available in their Agentic Academy — as a middle ground: installable in one line but with full architectural transparency, allowing users to understand and modify every assumption.

Key Insights

  • The creator argues that Hermes's self-learning loop has a self-validation problem: the same model that writes a skill is also the sole judge of its correctness, meaning it can silently overwrite user-improved skills with worse versions and has no version control or audit log.
  • The creator points out that Hermes's single-install architecture assumes one user working on one project, meaning agencies or multi-brand owners must spin up entirely separate Hermes installations per client — each with its own isolated memory and skills — creating a compounding maintenance problem where shared procedures must be duplicated across installs.
  • The creator identifies that Hermes's long-term memory recall is keyword-based rather than semantic, making it practically useless for retrieving conversations from months ago when users cannot remember the exact words they originally used with Claude.
  • The creator argues that Hermes's automatic skill-generation loop leads to skill sprawl over time — potentially 15 near-duplicate skills for similar tasks like LinkedIn posts — with overlapping descriptions that make it unclear which skill to invoke, and no single update point when brand voice or positioning changes.
  • A non-technical CEO spent over 100 hours and $1,000 testing OpenClaw over two months and ultimately concluded its bugs and security gaps disqualified it from any usable business application, but has since replicated roughly 30% of OpenClaw's features in Claude within a couple of months.

Topics

Hermes agentic system architecture analysisCustom Claude Code agentic OS buildMulti-client identity and brand context managementSemantic vs. keyword-based memory recallModular skill systems vs. auto-generated skills

Full transcript available for MurmurCast members

Sign Up to Access

Get AI summaries like this delivered to your inbox daily

Get AI summaries delivered to your inbox

MurmurCast summarizes your YouTube channels, podcasts, and newsletters into one daily email digest.