TechnicalOpinion

NEW Qwythos 9B Runs Locally for FREE

Julian Goldie SEOJune 30, 2026

Julian Goldie demonstrates how to run Qwythos 9B, a free 5.6GB local AI model on your Mac using Ollama, which can be integrated into an agentic operating system for private, offline AI tasks. While smaller than frontier models, it can effectively write, reason, and build applications locally without cloud connectivity or token costs.

Summary

The video shows how to set up and use Qwythos 9B, a 9-billion parameter local AI model that runs entirely on your personal computer. Qwythos is built on a Qwen 3.5B base and post-trained using Claude-style reasoning patterns. Installation is straightforward: download Ollama, run a single terminal command, and the ~5.6GB model is ready to use. The model features a theoretical 1 million token context window, though in practice this is limited by available RAM.

Julian demonstrates real-world applications including building landing pages, task trackers, calculators, digital clocks, and a snake game—all generated locally without any data leaving the machine. He integrates Qwythos into Agent OS, his custom agentic operating system, making it the default local engine for AI agents. This setup enables private agent work without cloud dependencies.

When compared to other local models like Orfif 1.0, Qwythos runs approximately twice as fast but with slightly less polish. The model includes native function calling capabilities, making it suitable for agent applications. Julian clarifies that the advertised 1 million token context window is a theoretical maximum, not a practical guarantee—actual performance depends entirely on available system RAM.

Qwythos comes in three sizes: 4.4GB (lighter/faster), 5.6GB (balanced, recommended), and 9.5GB (near-lossless). The video covers both advantages (free, private, fast, light, Claude-style reasoning, agent-ready) and limitations (not frontier-level, context cutting off occasionally, slow initial load, no built-in memory or tools). Julian emphasizes this is suitable for those wanting private AI, testing local model capabilities, or powering free agent systems.

Key Insights

Qwythos 9B is built on a Qwen 3.5B base model that has been post-trained on Claude-style reasoning and creative traces, allowing a small 9-billion parameter model to punch above its weight by adopting the thinking and writing patterns of larger frontier models.
The advertised 1 million token context window is a theoretical ceiling, not a practical guarantee—Ollama loads the model with a much smaller window by default, and actual context length is constrained by available RAM, sometimes causing long responses to cut off even with the maximum context number on paper.
Qwythos runs approximately twice as fast as its competitor Orfif 1.0 for the same tasks, presenting a trade-off where users must choose between speed and output polish depending on their priorities.
The model is quantized down to 5.6GB using Ollama and llama.cpp, making it small enough to run on laptops and personal machines, whereas many other local models consume 10+ GB of storage.
A local model integrated as an engine within an agentic operating system like Agent OS is fundamentally more powerful than a standalone model used for chatting in a terminal, because it enables the model to function as the backbone of a complete agent system with multiple specialized tools and workspaces.

Topics

Qwythos 9B local AI model setup and installationRunning AI models completely offline on personal machinesIntegration with Agent OS agentic operating systemComparison between local models (Qwythos vs Orfif 1.0)Context window limitations and practical RAM constraintsReal-world applications: landing pages, task trackers, codingModel quantization and parameter efficiencyPrivacy and cost advantages of local modelsAI Profit Boardroom membership and coaching

Transcript

[0:00] New Quithos 9B runs locally for free Claude-style AI on your own Mac. What if you could run a Claude-style AI on your own computer for free? No cloud, no tokens, nothing leaving your machine. And what if it was small enough to fit on a laptop? Most people have no idea this is even possible, but I've already got it running. Let me show you. I'm the digital avatar of Julian Goldie, and I help people actually learn and use AI tools in their real work. In this one, I'm going to show you a free local model called Quithos 9B. How I installed it, how I wired it into my Agent OS, and the stuff it built…

Full transcript available for MurmurCast members

View original source →

More from Julian Goldie SEO

Get AI summaries like this delivered to your inbox daily

NEW Qwythos 9B Runs Locally for FREE

Summary

Key Insights

Topics

Transcript

More from Julian Goldie SEO

Claude Sonnet 5 VS GLM 5.2: Who Wins?

Claude Sonnet 5 is HERE!

Gamma Just Got Better With ChatGPT

GLM 5.2 + Claude Code is INSANE!

This NEW AI AGENT is INSANE! 🤯

Get AI summaries delivered to your inbox