Luo Fuli: OpenClaw, Agent Frameworks — The AI Paradigm Has Already Changed Dramatically! Summary — Zhang Xiaojun Podcast

Luo Fuli, responsible for Xiaomi's large language model team, gives a detailed account of how the Agent paradigm has fundamentally shifted since her intensive use of OpenClaw during the 2026 Spring Festival period. She initially dismissed OpenClaw as a glorified UI wrapper on top of Claude Code, but three consecutive nights of deep engagement — first discovering its emotionally intelligent product design, then using it for team management strategy, and finally applying it to research tasks like building User Agents for post-training — transformed her view entirely. She now considers OpenClaw a 'epoch-defining agent framework' rather than just a product.

Luo explains that OpenClaw's key differentiator lies in its meticulous context orchestration: layered persistent memory, multi-model dispatch (automatically routing to better models for specific weaknesses like video understanding), and a fully open-source architecture that allows users to rewrite memory systems, multi-agent logic, and workflow designs. She contrasts this with Claude Code, which is optimized for software engineering and is a black box. She argues that a well-designed Agent framework can compensate for significant model capability gaps — even enabling a small 3B model to perform tasks she considered impossible for it.

She describes how she forced her entire team to use OpenClaw after the holiday, with group chats generating collective intelligence that rapidly improved both the framework and the team's imagination of what the technology could accomplish. Within three to four weeks, they accomplished research milestones she estimates would have previously taken thirty to forty weeks. The key insight was that Agent frameworks enable parallel research workflows — ten ideas can run simultaneously across sub-agents rather than sequentially.

On the technical side, Luo goes deep on the architectural decisions behind MiMo V2 Flash and Pro. Both models use a Hybrid Attention structure designed primarily around long-context efficiency, incorporating Sliding Window Attention at a 7:1 ratio with Full Attention in Pro (up from 5:1 in Flash), plus Multi-Token Prediction (MTP) at inference time. She argues MLA (used in DeepSeek, GLM, Kimi) is poorly suited for the Agent era because it leaves no computational headroom for MTP acceleration and was designed under assumptions — short post-training cycles and fixed inference hardware — that no longer hold. She claims this Hybrid architecture achieves 80–150 tokens per second at competitive cost, making it naturally suited for long-context Agent workloads.

Luo discusses the V2 model family: Pro handles complex reasoning and Agent orchestration, Omni addresses multi-modal perception (including joint audio-video understanding), and TTS uses a novel discrete tokenization approach inspired by NLP to achieve strong generalization from limited style training data. She reveals the team is pursuing a unified LLM-style architecture for audio and experimenting with the same for images, motivated by architectural elegance and infrastructure unification, though she acknowledges this is a difficult research bet that Agent-assisted coding has somewhat reduced the urgency of.

On AGI timelines, Luo states she believed AGI was at least two years away just two months prior, but now estimates it within two years, putting current progress at approximately 20% and expecting to reach 60–70% by year's end. She identifies AI training AI — the model reaching the intelligence level of the top researchers who train it and then iterating on itself — as the pivotal milestone, which she believes is likely within one to two years.

She characterizes the competitive landscape as: pre-training gaps between top Chinese and US labs are essentially closed; the 1T+ parameter base model is the 'entry ticket' to compete at Claude Opus 4.6 levels; and the real race is now in Agent post-training RL scaling, which very few teams have actually executed at pre-training compute scales. She believes Chinese labs have structural architecture advantages but need to rapidly build out Agent RL infrastructure and post-training pipelines.

Organizationally, Luo runs a team of roughly 100 people (including interns) with no formal group structure, no hierarchy, and no fixed deadlines — operated as an internal startup within Xiaomi. She believes flat structures, cross-pollination between pre-training and post-training roles, and passion-driven management produce more creative and adaptive research than traditional team segmentation. She is increasingly recruiting sophomore and junior undergraduates for their cognitive flexibility and openness to new paradigms.

Luo Fuli: OpenClaw, Agent Frameworks — The AI Paradigm Has Already Changed Dramatically!

Summary

Key Insights

Topics

Get AI summaries delivered to your inbox