NAN126: Fine-Tuning Open Source LLMs for Network Engineering
Edward Tuharu, founder of VXpert AI, discusses his career pivot from pursuing CCIE certification to building AI-powered NOC/SOC systems after recognizing the transformative potential of transformer architecture in 2022. He outlines the progression of AI technologies from prompting to RAG to fine-tuning to agentic systems, drawing parallels with networking protocol evolution and emphasizing the importance of domain-specific knowledge and fundamentals.
Summary
Edward Tuharu shares his 25+ year career trajectory spanning military radar systems and enterprise networking, culminating in his recent decision to build an AI startup. After failing his second CCIE exam in 2022, he encountered ChatGPT and read the 'Attention is All You Need' paper, which fundamentally changed his perspective on technology's future. He explains that the transformer architecture's attention mechanism—which allows parallel processing of words and identification of meaning-carrying keywords—represented a breakthrough that would eventually surpass human expertise. Rather than pursue a third CCIE attempt, he chose to become deeply familiar with this emerging technology.
Tuharu describes a methodical learning journey through AI technologies, viewing each innovation as solving previous problems while introducing new ones—much like networking protocol evolution from Spanning Tree through VPC, Fabric Path, and VXLAN. He traces his path through prompting (optimizing how to structure interactions with models), RAG/Retrieval-Augmented Generation (grounding responses in updated domain-specific databases to solve hallucination problems), fine-tuning (shaping model behavior through training on proprietary data), and finally agents with model context protocol (MCP) for taking actions.
When discussing model selection between Llama, Mistral, and Qwen, Tuharu emphasizes understanding pre-training data types. He distinguishes reasoning models from general language models, explaining that reasoning models are trained with reward functions incentivizing creative problem-solving. He contrasts the US approach of reinforcement learning from human feedback (RLHF) with recent Chinese research using reinforcement learning from verifiable outcomes, which allowed models to develop reasoning as their optimization strategy without human validation bottlenecks.
Tuharu details the fine-tuning process: defining the problem, collecting high-quality verified data, identifying appropriate GPU resources (describing memory calculations for models with billions of parameters), selecting fine-tuning algorithms with empirically-determined hyperparameters, and testing against predefined benchmarks. He emphasizes that fine-tuning is the practical investment for those with limited budgets, rather than attempting full model retraining, which he nevertheless experienced through participation in a MIT-led collaborative course on building large language models from scratch.
He advocates for learning networking and AI fundamentals deeply—understanding why protocols exist, not just how they work—drawing inspiration from Dr. Russ White's teaching philosophy. He positions his upcoming book 'Gen AI for Network Engineers' as encapsulating hard-learned lessons and failures, progressing from fundamentals through prompting, RAG, agents, and fine-tuning to help network engineers integrate AI tools into their skill sets.
About this episode
Eric welcomes Eduard Dulharu, a veteran network architect and the Founder and CTO of vExpertAI, to talk about how agentic AI, open-source LLMs, and digital twins are changing network operations. Eduard discusses the rapid evolution of generative AI, draws parallels between AI’s current limitations and early network protocols such as Spanning Tree, talks about why<a class="excerpt-read-more" href="https://packetpushers.net/podcasts/network-automation-nerds/nan126-fine-tuning-open-source-llms-for-network-engineering/" title="ReadNAN126: Fine-Tuning Open Source LLMs for Network Engineering">... Read more »</a>
Key Insights
- Tuharu argues that the attention mechanism in transformers replicates how humans extract meaning from sentences by weighting individual words, enabling parallel processing that overcomes sequential limitations of previous neural network architectures.
- He claims that recognizing the transformer's breakthrough potential in 2022 made pursuing additional professional certifications obsolete, as the technology would inevitably surpass human expertise within a reasonable timeframe.
- Tuharu posits that AI technology evolution mirrors networking protocol evolution, with each innovation solving previous limitations while introducing new problems that subsequent innovations address.
- He contends that RAG systems solve the hallucination problem not through model improvement but by grounding responses in current, proprietary databases under the user's control, shifting responsibility for accuracy from the model to the data quality.
- Tuharu distinguishes between US-based RLHF approaches requiring expensive human validation and Chinese research using reinforcement learning from verifiable outcomes, which he argues is more scalable and produces models that develop reasoning as their optimization strategy.
- He asserts that fine-tuning with techniques like LoRA or QLoRA represents the best return on investment for organizations with limited budgets compared to full model retraining, which is cost-prohibitive for most practitioners.
- Tuharu emphasizes that understanding fundamental principles—why a protocol exists rather than its mechanics—enables engineers to apply knowledge across different implementations and technologies.
- He argues that high-quality, human-verified training data is the most tedious yet critical component of fine-tuning, as errors in this data manifest as model hallucinations rather than genuine model failures.
Topics
Transcript
We're sponsored today by Curvium, an industry-leading system integrator that offers strategic IT consulting, professional engagements, automation, and AI. Curvium takes the time to understand your infrastructure needs and how to best support your business objectives. Curvium's skilled professionals put your needs first, from rapid designs to full-scale architectural planning, and from short-term project completion to multi-year support. Curvium provides the know-how and vendor connections you need to succeed. Find out more at curvium.com. While you're there, check out their secure campus network architectural blueprint. That's curvium.com. Hello, welcome to the Network Automation Nerds podcast, where we explore the latest in network automation from a practitioner's perspective. I'm your host, Eric Cho, a network engineer who loves everything about network…
Full transcript available for MurmurCast members
Sign Up to AccessMore from The Everything Feed - All Packet Pushers Pods
D2DO306: Platform Engineering in the Agentic Era (Sponsored)
Jad Elzane and Miles Gray from VMware by Broadcom discuss how platform engineering evolved from DevOps to address developer cognitive overload, and how Platform Engineering 2.0 must now accommodate AI agents as consumers alongside human developers, requiring new security guardrails and observability controls.
PP116: News Roundup—FortiBleed Reveals Password Cracking Is Alive and Kicking, Accenture Goes All-In on OT, and More
Jennifer Jabush and guest co-host Wolf Gerlich discuss major cybersecurity incidents including the SearchLeak Copilot vulnerability, the FortiBleed password-cracking infrastructure, North Korean NPM package compromises, and organizational acquisitions in the OT security space. They also cover concerns about age verification systems and a FIFA World Cup broadcast vulnerability involving weak client-side authentication.
HS137: Did AI Turn “Everybody Codes” into “Nobody Codes”?
John Attil-Johnson and John Burke discuss how AI coding tools have fundamentally changed the "everybody codes" strategy, arguing that while AI can generate code quickly, logical thinking and code comprehension remain essential skills. They contend that the focus should shift from teaching everyone to code to ensuring everyone can read code and think logically to catch AI-generated errors.
IPB202: How to Get Hands-On IPv6 Deployment Experience
Ed Horley interviews John, an experienced network engineer, about practical ways to gain hands-on IPv6 experience at home. They discuss consumer-grade IPv6 setups, multi-homing challenges, ULA addressing, NAT/masquerading trade-offs, and how working with multiple historical protocols informs modern IPv6 design thinking.
N4N057: The Art of Troubleshooting
Ethan Banks and Holly Podbilak discuss a structured methodology for network troubleshooting on the NS for Networking podcast. They cover steps from gathering information and recreating problems to using tools like AI, logs, and packet captures, while emphasizing the human elements of staying calm, working as a team, and documenting root causes.