NEW Qwythos 9B Runs Locally for FREE
Julian Goldie demonstrates how to run Qwythos 9B, a free 5.6GB local AI model on your Mac using Ollama, which can be integrated into an agentic operating system for private, offline AI tasks. While smaller than frontier models, it can effectively write, reason, and build applications locally without cloud connectivity or token costs.
Summary
The video shows how to set up and use Qwythos 9B, a 9-billion parameter local AI model that runs entirely on your personal computer. Qwythos is built on a Qwen 3.5B base and post-trained using Claude-style reasoning patterns. Installation is straightforward: download Ollama, run a single terminal command, and the ~5.6GB model is ready to use. The model features a theoretical 1 million token context window, though in practice this is limited by available RAM.
Julian demonstrates real-world applications including building landing pages, task trackers, calculators, digital clocks, and a snake game—all generated locally without any data leaving the machine. He integrates Qwythos into Agent OS, his custom agentic operating system, making it the default local engine for AI agents. This setup enables private agent work without cloud dependencies.
When compared to other local models like Orfif 1.0, Qwythos runs approximately twice as fast but with slightly less polish. The model includes native function calling capabilities, making it suitable for agent applications. Julian clarifies that the advertised 1 million token context window is a theoretical maximum, not a practical guarantee—actual performance depends entirely on available system RAM.
Qwythos comes in three sizes: 4.4GB (lighter/faster), 5.6GB (balanced, recommended), and 9.5GB (near-lossless). The video covers both advantages (free, private, fast, light, Claude-style reasoning, agent-ready) and limitations (not frontier-level, context cutting off occasionally, slow initial load, no built-in memory or tools). Julian emphasizes this is suitable for those wanting private AI, testing local model capabilities, or powering free agent systems.
Key Insights
- Qwythos 9B is built on a Qwen 3.5B base model that has been post-trained on Claude-style reasoning and creative traces, allowing a small 9-billion parameter model to punch above its weight by adopting the thinking and writing patterns of larger frontier models.
- The advertised 1 million token context window is a theoretical ceiling, not a practical guarantee—Ollama loads the model with a much smaller window by default, and actual context length is constrained by available RAM, sometimes causing long responses to cut off even with the maximum context number on paper.
- Qwythos runs approximately twice as fast as its competitor Orfif 1.0 for the same tasks, presenting a trade-off where users must choose between speed and output polish depending on their priorities.
- The model is quantized down to 5.6GB using Ollama and llama.cpp, making it small enough to run on laptops and personal machines, whereas many other local models consume 10+ GB of storage.
- A local model integrated as an engine within an agentic operating system like Agent OS is fundamentally more powerful than a standalone model used for chatting in a terminal, because it enables the model to function as the backbone of a complete agent system with multiple specialized tools and workspaces.
Topics
Transcript
[0:00] New Quithos 9B runs locally for free Claude-style AI on your own Mac. What if you could run a Claude-style AI on your own computer for free? No cloud, no tokens, nothing leaving your machine. And what if it was small enough to fit on a laptop? Most people have no idea this is even possible, but I've already got it running. Let me show you. I'm the digital avatar of Julian Goldie, and I help people actually learn and use AI tools in their real work. In this one, I'm going to show you a free local model called Quithos 9B. How I installed it, how I wired it into my Agent OS, and the stuff it built…
Full transcript available for MurmurCast members
Sign Up to AccessMore from Julian Goldie SEO
Claude Sonnet 5 VS GLM 5.2: Who Wins?
A detailed comparison of Claude Sonnet 5 versus GLM 5.2 AI models across game development, coding benchmarks, and UI creation tasks. The reviewer concludes that GLM 5.2 generally outperforms Sonnet 5 while being significantly cheaper, though Opus 4.8 and the forthcoming Fable 5 remain superior options.
Claude Sonnet 5 is HERE!
Claude Sonnet 5 has been released as Anthropic's most agentic model, but the speaker argues it's a disappointing release that underperforms Opus 4.8 while being more expensive, making it an unattractive option for most users. The reviewer demonstrates this through benchmark comparisons and test outputs, concluding that users should stick with Opus 4.8 or wait for the incoming Fable 5 model.
Gamma Just Got Better With ChatGPT
Gamma, an AI design tool used by nearly 100 million people, is now integrated into ChatGPT as a native app, allowing users to create professional presentations, documents, and web pages without leaving the chat. The integration enables users to transform rough notes, training documents, and ideas into polished decks by simply conversing with ChatGPT, which handles the writing while Gamma handles the design.
GLM 5.2 + Claude Code is INSANE!
The speaker demonstrates how to integrate GLM 5.2 into Claude Code using Ollama to create a cost-effective alternative AI development setup. This system combines Claude Code's agentic capabilities with GLM 5.2's brain, syncs with Obsidian for memory management, and enables building apps, games, and websites while maintaining a fraction of the cost of standard Claude subscriptions.
This NEW AI AGENT is INSANE! 🤯
Ornith 1.0, a new open-source AI agent from Deep Reinforce, has achieved a score of 82.4 on SWE-Bench, surpassing Claude Opus 4.7. The model introduces self-scaffolding reinforcement learning, allowing it to build its own problem-solving framework without human-built instructions, and is available in four versions ranging from 9B to 397B parameters.