How To Run Open-Source AI Models on Your Phone (For Free)
The video demonstrates how to run open-source AI models locally on an iPhone using a free app called Locally AI, without requiring an internet connection. The presenter showcases the newly released Qwen 3.5 model in various sizes and tests its capabilities for brainstorming, image recognition, and parenting advice. The key appeal is privacy and offline functionality, with performance comparable to state-of-the-art models from about two years ago.
Summary
The presenter introduces the concept of running AI models directly on a smartphone without internet connectivity, positioning this as a privacy-friendly alternative to cloud-based services like OpenAI, Anthropic, and Google. The motivation is discovered through a post by Adrian Grondin, who was running the new Qwen 3.5 model on his phone in airplane mode, prompting the presenter to investigate further.
The presenter explains the Qwen 3.5 model family, which was released on March 2nd and comes in four sizes: 800 million, 2 billion, 4 billion, and 9 billion parameters. He notes that while benchmarks show it performing on par with or better than GPT-5 Nano in most tests, he doesn't place excessive weight on benchmarks. The practical use case is everyday tasks like brainstorming or getting parenting advice rather than complex logic or math problems.
The app used is called Locally AI, created by Adrian Grondin, available on the App Store with a 4.8-star rating from 579 reviews. Upon opening, it offers several model options including Apple's built-in Foundation model, Gemma 2, Qwen 3, and Llama 3.2, with the Qwen 3.5 available after skipping the initial screen. Model recommendations are tied to device generation: the 4B model requires an iPhone 15 Pro or newer, the 2B model needs an iPhone 15, and the 800M model works on iPhone 14 or newer.
The presenter runs several practical tests: a basic 'strawberry' letter-counting test (passed), a logic problem about whether to walk or drive to a car wash (failed, showing the model's reasoning limitations), a YouTube video brainstorming session (successful, generating 30+ ideas), a visual recognition test of a drink (successful), and a parenting advice prompt run entirely in airplane mode (successful). He also demonstrates a 'thinking mode' that enables chain-of-thought reasoning, though it causes the phone to warm up and slows down as conversation context grows longer.
The presenter concludes by contextualizing the technology: while not comparable to the most advanced current models like GPT-5.3, Claude Opus, or Sonics, these on-device models likely surpass what was state-of-the-art roughly 1.5 to 2 years ago. He emphasizes that no data is sent to any cloud service, making it fully private. The video is unsponsored, and the app is free to use on relatively recent iPhones.
Key Insights
- The presenter argues that on-device models like Qwen 3.5 are likely better than the most state-of-the-art cloud models available just 1.5 to 2 years ago, reframing local AI not as inferior but as historically significant in capability.
- The presenter demonstrates that the Locally AI app operates fully in airplane mode with zero internet connectivity, confirming that no data is transmitted to any cloud service and none of the major AI companies can train on user prompts.
- The presenter notes that the Qwen 3.5 2B model slows down noticeably as conversation context grows longer, with even basic scrolling becoming choppy — indicating a practical limitation of running large context windows on mobile hardware.
- The presenter observes that enabling 'thinking mode' (chain-of-thought reasoning) on the 2B model causes the phone to physically warm up, signaling meaningful on-device GPU utilization during extended inference.
- The presenter points out that the Qwen 3.5 2B model fails a basic logical reasoning test about whether to walk or drive to a car wash, concluding that these on-device models are better suited for brainstorming and everyday queries than complex logical problem-solving.
Topics
Full transcript available for MurmurCast members
Sign Up to Access