TechnicalInsightful

How To Run Open-Source AI Models on Your Phone (For Free)

Matt WolfeMarch 4, 2026

The video demonstrates how to run open-source AI models locally on an iPhone using a free app called Locally AI, without requiring an internet connection. The presenter showcases the newly released Qwen 3.5 model in various sizes and tests its capabilities for brainstorming, image recognition, and parenting advice. The key appeal is privacy and offline functionality, with performance comparable to state-of-the-art models from about two years ago.

Summary

The presenter introduces the concept of running AI models directly on a smartphone without internet connectivity, positioning this as a privacy-friendly alternative to cloud-based services like OpenAI, Anthropic, and Google. The motivation is discovered through a post by Adrian Grondin, who was running the new Qwen 3.5 model on his phone in airplane mode, prompting the presenter to investigate further.

The presenter explains the Qwen 3.5 model family, which was released on March 2nd and comes in four sizes: 800 million, 2 billion, 4 billion, and 9 billion parameters. He notes that while benchmarks show it performing on par with or better than GPT-5 Nano in most tests, he doesn't place excessive weight on benchmarks. The practical use case is everyday tasks like brainstorming or getting parenting advice rather than complex logic or math problems.

The app used is called Locally AI, created by Adrian Grondin, available on the App Store with a 4.8-star rating from 579 reviews. Upon opening, it offers several model options including Apple's built-in Foundation model, Gemma 2, Qwen 3, and Llama 3.2, with the Qwen 3.5 available after skipping the initial screen. Model recommendations are tied to device generation: the 4B model requires an iPhone 15 Pro or newer, the 2B model needs an iPhone 15, and the 800M model works on iPhone 14 or newer.

The presenter runs several practical tests: a basic 'strawberry' letter-counting test (passed), a logic problem about whether to walk or drive to a car wash (failed, showing the model's reasoning limitations), a YouTube video brainstorming session (successful, generating 30+ ideas), a visual recognition test of a drink (successful), and a parenting advice prompt run entirely in airplane mode (successful). He also demonstrates a 'thinking mode' that enables chain-of-thought reasoning, though it causes the phone to warm up and slows down as conversation context grows longer.

The presenter concludes by contextualizing the technology: while not comparable to the most advanced current models like GPT-5.3, Claude Opus, or Sonics, these on-device models likely surpass what was state-of-the-art roughly 1.5 to 2 years ago. He emphasizes that no data is sent to any cloud service, making it fully private. The video is unsponsored, and the app is free to use on relatively recent iPhones.

Key Insights

The presenter argues that on-device models like Qwen 3.5 are likely better than the most state-of-the-art cloud models available just 1.5 to 2 years ago, reframing local AI not as inferior but as historically significant in capability.
The presenter demonstrates that the Locally AI app operates fully in airplane mode with zero internet connectivity, confirming that no data is transmitted to any cloud service and none of the major AI companies can train on user prompts.
The presenter notes that the Qwen 3.5 2B model slows down noticeably as conversation context grows longer, with even basic scrolling becoming choppy — indicating a practical limitation of running large context windows on mobile hardware.
The presenter observes that enabling 'thinking mode' (chain-of-thought reasoning) on the 2B model causes the phone to physically warm up, signaling meaningful on-device GPU utilization during extended inference.
The presenter points out that the Qwen 3.5 2B model fails a basic logical reasoning test about whether to walk or drive to a car wash, concluding that these on-device models are better suited for brainstorming and everyday queries than complex logical problem-solving.

Topics

Running AI models offline on iPhoneLocally AI app overview and setupQwen 3.5 model family and capabilitiesPrivacy benefits of on-device AIModel size trade-offs and device requirements

Transcript

[0:00] Okay, so I came across something really cool that I think everybody's going to want to know about and it's the ability to run AI models on your phone that are actually really good AI models without needing to be connected to the internet. So like you could use these models while you're on a plane and at your house when you don't want them sent to any sort of cloud service that you don't want to use open AI or Anthropic or Google or any of those companies, you could use some of the best available open weight models on your phone. So check this out. This should be a pretty quick video. So I came across this…

Full transcript available for MurmurCast members

View original source →

More from Matt Wolfe

Get AI summaries like this delivered to your inbox daily

How To Run Open-Source AI Models on Your Phone (For Free)

Summary

Key Insights

Topics

Transcript

More from Matt Wolfe

The ONLY AI Benchmark You Need!

GLM-5.2 - The Open Model That's As Good As Opus!

Don't Fall For This AI Trap

AI News: Fable Banned, New Open-Source Leader, Midjourney Shocker

AI News: Claude's Massive Leap & Siri Gets Good!?

Get AI summaries delivered to your inbox