Luo Fuli: OpenClaw, Agent Frameworks — The AI Paradigm Has Already Changed Dramatically!
Luo Fuli, head of Xiaomi's large model division, describes how her firsthand experience with OpenClaw over Spring Festival 2026 fundamentally changed her understanding of AI agent frameworks as a paradigm shift — not just a product. She explains how OpenClaw's open-source, sophisticated context orchestration enabled her team to dramatically accelerate research and model training, and outlines how this Agent era demands a new approach to model architecture, post-training, and organizational design.
Summary
Luo Fuli, responsible for Xiaomi's large language model team, gives a detailed account of how the Agent paradigm has fundamentally shifted since her intensive use of OpenClaw during the 2026 Spring Festival period. She initially dismissed OpenClaw as a glorified UI wrapper on top of Claude Code, but three consecutive nights of deep engagement — first discovering its emotionally intelligent product design, then using it for team management strategy, and finally applying it to research tasks like building User Agents for post-training — transformed her view entirely. She now considers OpenClaw a 'epoch-defining agent framework' rather than just a product.
Luo explains that OpenClaw's key differentiator lies in its meticulous context orchestration: layered persistent memory, multi-model dispatch (automatically routing to better models for specific weaknesses like video understanding), and a fully open-source architecture that allows users to rewrite memory systems, multi-agent logic, and workflow designs. She contrasts this with Claude Code, which is optimized for software engineering and is a black box. She argues that a well-designed Agent framework can compensate for significant model capability gaps — even enabling a small 3B model to perform tasks she considered impossible for it.
She describes how she forced her entire team to use OpenClaw after the holiday, with group chats generating collective intelligence that rapidly improved both the framework and the team's imagination of what the technology could accomplish. Within three to four weeks, they accomplished research milestones she estimates would have previously taken thirty to forty weeks. The key insight was that Agent frameworks enable parallel research workflows — ten ideas can run simultaneously across sub-agents rather than sequentially.
On the technical side, Luo goes deep on the architectural decisions behind MiMo V2 Flash and Pro. Both models use a Hybrid Attention structure designed primarily around long-context efficiency, incorporating Sliding Window Attention at a 7:1 ratio with Full Attention in Pro (up from 5:1 in Flash), plus Multi-Token Prediction (MTP) at inference time. She argues MLA (used in DeepSeek, GLM, Kimi) is poorly suited for the Agent era because it leaves no computational headroom for MTP acceleration and was designed under assumptions — short post-training cycles and fixed inference hardware — that no longer hold. She claims this Hybrid architecture achieves 80–150 tokens per second at competitive cost, making it naturally suited for long-context Agent workloads.
Luo discusses the V2 model family: Pro handles complex reasoning and Agent orchestration, Omni addresses multi-modal perception (including joint audio-video understanding), and TTS uses a novel discrete tokenization approach inspired by NLP to achieve strong generalization from limited style training data. She reveals the team is pursuing a unified LLM-style architecture for audio and experimenting with the same for images, motivated by architectural elegance and infrastructure unification, though she acknowledges this is a difficult research bet that Agent-assisted coding has somewhat reduced the urgency of.
On AGI timelines, Luo states she believed AGI was at least two years away just two months prior, but now estimates it within two years, putting current progress at approximately 20% and expecting to reach 60–70% by year's end. She identifies AI training AI — the model reaching the intelligence level of the top researchers who train it and then iterating on itself — as the pivotal milestone, which she believes is likely within one to two years.
She characterizes the competitive landscape as: pre-training gaps between top Chinese and US labs are essentially closed; the 1T+ parameter base model is the 'entry ticket' to compete at Claude Opus 4.6 levels; and the real race is now in Agent post-training RL scaling, which very few teams have actually executed at pre-training compute scales. She believes Chinese labs have structural architecture advantages but need to rapidly build out Agent RL infrastructure and post-training pipelines.
Organizationally, Luo runs a team of roughly 100 people (including interns) with no formal group structure, no hierarchy, and no fixed deadlines — operated as an internal startup within Xiaomi. She believes flat structures, cross-pollination between pre-training and post-training roles, and passion-driven management produce more creative and adaptive research than traditional team segmentation. She is increasingly recruiting sophomore and junior undergraduates for their cognitive flexibility and openness to new paradigms.
Key Insights
- Luo Fuli argues that OpenClaw's true breakthrough is not its UI design but its meticulous context orchestration — including layered persistent memory, automatic multi-model dispatch to compensate for individual model weaknesses, and full open-source modifiability — which together allow it to compensate for significant model capability gaps. She reports that even a 3B model performed tasks she considered impossible for it when embedded in OpenClaw's framework.
- Luo argues that MLA (Multi-head Latent Attention, used by DeepSeek, GLM, and Kimi) is poorly suited for the Agent era because it was designed under now-obsolete assumptions: short post-training cycles and fixed inference hardware. MLA leaves no computational headroom for MTP-based inference acceleration, making models slower and more expensive for long-context Agent workloads compared to Hybrid Attention architectures like MiMo V2.
- Luo claims that in the Agent paradigm, post-training compute should equal pre-training compute — a ratio of approximately 1:1 — and that research compute should exceed both at a ratio of roughly 3:1:1 (research : pre-train : post-train). She states this is a dramatic shift from the Chat era ratio she describes as roughly 1:5:1 in favor of pre-training.
- Luo describes how she forced her team to use OpenClaw after Spring Festival by declaring anyone with fewer than 100 conversation turns the next day could quit — while privately having no intention of actually enforcing this. The real goal was forcing experiential exposure, because she believes experiencing a technology firsthand is the most effective way to ignite passion and imagination in a team, and that collective group experimentation multiplies individual imagination.
- Luo argues that previous Agent frameworks and benchmarks (SWE-bench, BrowseComp, TAO-Bench) were fundamentally too simple and task-specific to constitute real Agent capability — they were essentially Chat with a slightly more complex system prompt and minimal environmental feedback. She states her team abandoned all such benchmarks when training MiMo V2, relying instead on body-sense evaluation within complex real Agent frameworks like Claude Code and OpenClaw as the true test of industrial-grade usability.
Topics
Full transcript available for MurmurCast members
Sign Up to Access