TNO061: Networking Theory and Practice; Networking in the Classroom Today
Scott Robon interviews Andy Smith, a distinguished engineer at Arcus Networks and Penn lecturer, at NANOG 96 in San Francisco. They discuss Andy's career spanning cable, hyperscaler, and software-defined networking, his university course on network engineering fundamentals, and the broader industry shift toward disaggregated networking, automation, and AI cluster infrastructure. The conversation emphasizes the enduring importance of first-principles thinking in an era of rapidly evolving network architectures.
Summary
Recorded live at NANOG 96 in San Francisco, this episode of Total Network Operations features host Scott Robon in conversation with Andy Smith, a distinguished engineer at Arcus Networks and lecturer at the University of Pennsylvania's School of Engineering and Applied Science. The interview covers Andy's career trajectory, his teaching philosophy, and his views on the state of modern networking.
Andy traces his fascination with networking back to childhood visits to his father's office, where he first saw a router connected to a T1 line. He frames networking as humanity's novel ability to move data through space in near real-time — contrasting it with older technologies like hieroglyphics or the printing press that moved data through time. His career has spanned broadband/cable, hyperscaler cloud, and now software-oriented networking at Arcus, and he sees these as sharing the same base technical principles while differing significantly in monetization model, scale, and culture. In cable/ISP environments, the network itself is the product being sold; in hyperscaler environments, the network exists to facilitate compute, which is the actual revenue-generating service.
On teaching, Andy describes a full-credit course on network engineering and algorithms at Penn, cross-listed for graduate and undergraduate students, with around 140 enrolled. The course is deliberately academic rather than certification-focused — students work through Dijkstra's algorithm and CSMA-CD by hand on paper. Andy argues that understanding the mathematical and graph-theoretic foundations of networking is more durable than memorizing CLI commands, and that vendor-agnostic first-principles education produces engineers who can adapt across environments. He uses guest speakers from industry and exercises like traceroute analysis to connect academic content to real-world business dynamics, including who pays whom in BGP peering and transit relationships.
The conversation dives into AI cluster networking as a case study in returning to first principles. Andy explains that AI backend networks are fundamentally different from internet or cloud front-end networks: traffic is fully predictable and scheduler-driven, compute and network are tightly coupled via smart NICs, and the yield of the entire cluster depends on synchronization across all nodes — unlike the graceful, any-to-any, unpredictable traffic patterns of traditional internet networks. He argues this required revisiting base networking science to engineer correctly at scale.
On automation, Andy makes a pointed argument: a network that was not designed to be automated cannot be successfully automated after the fact. He draws an analogy between packet networks and capitalism — both function by having distributed agents make local decisions based on globally available information, and both fail under centralized control (as OpenFlow-style SDN demonstrated). Hyperscaler networks succeed at automation because they are built with regular, industrializable topologies like Clos fabrics from the ground up, with automation baked into the entire lifecycle rather than bolted on afterward.
The discussion of disaggregated and open networking traces the hardware evolution from complex multi-chip line cards to simple two-chip (NPU + CPU) pizza-box systems enabled by merchant silicon, particularly Broadcom. Andy draws a historical parallel to the disruption of Digital Equipment Corporation (DEC) and its VAX ecosystem by open standards in the 1980s and 1990s, arguing that the current disaggregation trend represents a similar inflection point. He notes that operators value disaggregation less for its theoretical software portability and more as leverage — a credible threat to swap out incumbents without a forklift hardware replacement. However, he emphasizes that the real operational goal is reducing the cost and complexity of running the network at scale, not just having the option to switch software.
The episode closes with a return to the classroom: Andy describes student engagement with BGP economics (follow-the-money analysis of traceroutes) and annual industry guest speakers covering entrepreneurship and network architecture. He publishes monthly on LinkedIn and welcomes outreach from the networking community.
Key Insights
- Andy Smith argues that packet networks are fundamentally analogous to capitalism: both work by having distributed agents make local decisions based on globally available information, and both fail under centralized control — which is why OpenFlow-style SDN did not scale.
- Smith contends that automation retrofitted onto an existing network almost always fails; he has never seen it succeed. A network must be designed to be industrializable from the ground up, with automation built into the entire lifecycle, not added as a later goal.
- Smith argues that AI backend networks invert the assumptions of traditional networking: traffic is fully predictable and scheduler-controlled, compute and network are tightly coupled via smart NICs, and cluster yield depends on the last GPU returning a result — a synchronization constraint with no equivalent in internet networking.
- Smith draws a historical parallel between the current disaggregated networking trend and the disruption of Digital Equipment Corporation's vertically integrated VAX ecosystem by open standards in the 1980s, arguing the same inflection point is now happening in networking.
- Smith argues that the primary value operators see in disaggregated networking is not software portability per se, but leverage — the credible threat of swapping out an incumbent software vendor without replacing hardware, which improves negotiating position.
- Smith's university course deliberately avoids CLI and vendor software, instead having students work through Dijkstra's algorithm and CSMA-CD by hand, on the grounds that mathematical and graph-theoretic foundations are more durable and transferable than vendor-specific configuration knowledge.
- Smith claims that Broadcom's dominance in merchant silicon is partly deserved because building a modern networking ASIC requires enormous upfront capital, years of development, and a bet on both market demand and physical feasibility — risks Broadcom has consistently navigated successfully.
- Smith observes that the number of entities actually building large-scale AI clusters is countable on two hands, meaning that unlike the internet boom — which created a broad ecosystem of talent and companies — the AI infrastructure wave may be economically massive but will employ comparatively few individual network engineers.
Topics
Full transcript available for MurmurCast members
Sign Up to Access