NEW Ollama 0.19 Update is INSANE!

Julian Goldie SEOApril 3, 2026

Ollama 0.19 introduces a massive speed improvement for local AI on Apple silicon by integrating with Apple's MLX framework, achieving nearly 2x faster response generation and 1.6x faster input processing. The update also includes smarter caching across conversations and support for Nvidia's NVFP4 format, making local AI competitive with cloud services for the first time.

Summary

Ollama 0.19 represents a major breakthrough in local AI performance, specifically for Apple silicon devices. The update integrates with Apple's MLX machine learning framework, which takes advantage of unified memory architecture where CPU and GPU share the same memory pool without transfer overhead. Benchmark results using Alibaba's Qwen 3.5 35B model show prefill speed increasing from 1,154 to 1,110 tokens per second (1.6x improvement) and decode speed jumping from 58 to 112 tokens per second (nearly 2x improvement). With INT4 quantization, speeds can reach up to 134 tokens per second on decode. The update also introduces intelligent caching that preserves context across conversations, eliminating the need to reprocess project files and instructions from scratch each session. This particularly benefits coding agents and daily assistant tools. Additionally, Ollama 0.19 supports Nvidia's NVFP4 format for model compression, allowing larger models to run on the same hardware while maintaining accuracy. The update requires Mac devices with Apple silicon and more than 32GB of unified memory. This represents a fundamental shift in the local vs. cloud AI trade-off, making local AI genuinely fast rather than just a privacy-focused compromise.

Key Insights

The creator states that Apple silicon chips use unified memory where CPU and GPU share one memory pool with no copying or transfer overhead, unlike traditional computers where CPU and GPU have separate memory pools
Ollama's own testing shows that version 0.19 with MLX achieves 1,110 tokens per second on prefill (1.6x increase) and 112 tokens per second on decode (nearly double) compared to version 0.18
The speaker explains that Ollama 0.19 can now reuse cache across conversations by storing intelligent checkpoints, so when branching into new conversations the model picks up from where it left off instead of reprocessing everything
The creator argues that local AI has had a perception problem where people assumed cloud was for performance and local was only for privacy purists or tinkerers, but Ollama 0.19 is shifting that narrative
The speaker claims that Apple's MLX framework has been shown in independent research to achieve some of the highest throughput numbers for AI inference on Apple silicon, outperforming older backends by 20 to 30% in sustained generation speed

Topics

Ollama 0.19 updateApple MLX framework integrationLocal AI performance improvementsUnified memory architectureIntelligent caching systemNVFP4 format supportBenchmark comparisonsHardware requirements

Transcript

[0:00] New Ollama 0.19 update is insane. You've been running AI locally, but it's been slow, frustratingly slow. You wait for a response, you lose your train of thought, you go back to using cloud tools, and you wonder if local AI is even worth it. That stops today because Ollama just dropped something that changes everything. Hey, I'm the digital avatar of Julian Goldie, and on this channel, I break down AI tools so you can actually use them, not just watch other people talk about them. Today, we're covering the Ollama 0.19 update, why it matters, what's actually new, and how to get started right now. Stick [0:30] around because by the end of this video, you'll know…

Full transcript available for MurmurCast members

View original source →

More from Julian Goldie SEO

Get AI summaries like this delivered to your inbox daily

NEW Ollama 0.19 Update is INSANE!

Summary

Key Insights

Topics

Transcript

More from Julian Goldie SEO

How to Rank #1 with Claude Fable 5 AI SEO!

NEW Hermes + Paperclip AI Agent Update Is INSANE

NEW Claude Agentic Operating System is INSANE!

Claude Fable 5 VS Hermes moa VS Fusion: Who Wins?

Google Gemini New Updates are INSANE!

Get AI summaries delivered to your inbox