TechnicalDiscussion

NAN126: Fine-Tuning Open Source LLMs for Network Engineering

Edward Tuharu, founder of VXpert AI, discusses his career pivot from pursuing CCIE certification to building AI-powered NOC/SOC systems after recognizing the transformative potential of transformer architecture in 2022. He outlines the progression of AI technologies from prompting to RAG to fine-tuning to agentic systems, drawing parallels with networking protocol evolution and emphasizing the importance of domain-specific knowledge and fundamentals.

Summary

Edward Tuharu shares his 25+ year career trajectory spanning military radar systems and enterprise networking, culminating in his recent decision to build an AI startup. After failing his second CCIE exam in 2022, he encountered ChatGPT and read the 'Attention is All You Need' paper, which fundamentally changed his perspective on technology's future. He explains that the transformer architecture's attention mechanism—which allows parallel processing of words and identification of meaning-carrying keywords—represented a breakthrough that would eventually surpass human expertise. Rather than pursue a third CCIE attempt, he chose to become deeply familiar with this emerging technology.

Tuharu describes a methodical learning journey through AI technologies, viewing each innovation as solving previous problems while introducing new ones—much like networking protocol evolution from Spanning Tree through VPC, Fabric Path, and VXLAN. He traces his path through prompting (optimizing how to structure interactions with models), RAG/Retrieval-Augmented Generation (grounding responses in updated domain-specific databases to solve hallucination problems), fine-tuning (shaping model behavior through training on proprietary data), and finally agents with model context protocol (MCP) for taking actions.

When discussing model selection between Llama, Mistral, and Qwen, Tuharu emphasizes understanding pre-training data types. He distinguishes reasoning models from general language models, explaining that reasoning models are trained with reward functions incentivizing creative problem-solving. He contrasts the US approach of reinforcement learning from human feedback (RLHF) with recent Chinese research using reinforcement learning from verifiable outcomes, which allowed models to develop reasoning as their optimization strategy without human validation bottlenecks.

Tuharu details the fine-tuning process: defining the problem, collecting high-quality verified data, identifying appropriate GPU resources (describing memory calculations for models with billions of parameters), selecting fine-tuning algorithms with empirically-determined hyperparameters, and testing against predefined benchmarks. He emphasizes that fine-tuning is the practical investment for those with limited budgets, rather than attempting full model retraining, which he nevertheless experienced through participation in a MIT-led collaborative course on building large language models from scratch.

He advocates for learning networking and AI fundamentals deeply—understanding why protocols exist, not just how they work—drawing inspiration from Dr. Russ White's teaching philosophy. He positions his upcoming book 'Gen AI for Network Engineers' as encapsulating hard-learned lessons and failures, progressing from fundamentals through prompting, RAG, agents, and fine-tuning to help network engineers integrate AI tools into their skill sets.

About this episode

Eric welcomes Eduard Dulharu, a veteran network architect and the Founder and CTO of vExpertAI, to talk about how agentic AI, open-source LLMs, and digital twins are changing network operations. Eduard discusses the rapid evolution of generative AI, draws parallels between AI&#8217;s current limitations and early network protocols such as Spanning Tree, talks about why<a class="excerpt-read-more" href="https://packetpushers.net/podcasts/network-automation-nerds/nan126-fine-tuning-open-source-llms-for-network-engineering/" title="ReadNAN126: Fine-Tuning Open Source LLMs for Network Engineering">... Read more &#187;</a>

Key Insights

  • Tuharu argues that the attention mechanism in transformers replicates how humans extract meaning from sentences by weighting individual words, enabling parallel processing that overcomes sequential limitations of previous neural network architectures.
  • He claims that recognizing the transformer's breakthrough potential in 2022 made pursuing additional professional certifications obsolete, as the technology would inevitably surpass human expertise within a reasonable timeframe.
  • Tuharu posits that AI technology evolution mirrors networking protocol evolution, with each innovation solving previous limitations while introducing new problems that subsequent innovations address.
  • He contends that RAG systems solve the hallucination problem not through model improvement but by grounding responses in current, proprietary databases under the user's control, shifting responsibility for accuracy from the model to the data quality.
  • Tuharu distinguishes between US-based RLHF approaches requiring expensive human validation and Chinese research using reinforcement learning from verifiable outcomes, which he argues is more scalable and produces models that develop reasoning as their optimization strategy.
  • He asserts that fine-tuning with techniques like LoRA or QLoRA represents the best return on investment for organizations with limited budgets compared to full model retraining, which is cost-prohibitive for most practitioners.
  • Tuharu emphasizes that understanding fundamental principles—why a protocol exists rather than its mechanics—enables engineers to apply knowledge across different implementations and technologies.
  • He argues that high-quality, human-verified training data is the most tedious yet critical component of fine-tuning, as errors in this data manifest as model hallucinations rather than genuine model failures.

Topics

Transformer architecture and attention mechanismsCareer transition from networking to AIAI technology progression: prompting → RAG → fine-tuning → agentsFine-tuning open-source LLMs for network engineeringReasoning models vs general language modelsGPU requirements and computational considerationsFundamentals-based learning approachBuilding autonomous AI-powered NOC/SOC systems

Transcript

We're sponsored today by Curvium, an industry-leading system integrator that offers strategic IT consulting, professional engagements, automation, and AI. Curvium takes the time to understand your infrastructure needs and how to best support your business objectives. Curvium's skilled professionals put your needs first, from rapid designs to full-scale architectural planning, and from short-term project completion to multi-year support. Curvium provides the know-how and vendor connections you need to succeed. Find out more at curvium.com. While you're there, check out their secure campus network architectural blueprint. That's curvium.com. Hello, welcome to the Network Automation Nerds podcast, where we explore the latest in network automation from a practitioner's perspective. I'm your host, Eric Cho, a network engineer who loves everything about network…

Full transcript available for MurmurCast members

Sign Up to Access

More from The Everything Feed - All Packet Pushers Pods

Get AI summaries like this delivered to your inbox daily

Get AI summaries delivered to your inbox

MurmurCast summarizes your YouTube channels, podcasts, and newsletters into one daily email digest.