TechnicalDiscussion

D2DO304: Observability in the Age of AI

The Everything Feed - All Packet Pushers PodsJune 10, 202644m 30s

Kyler Middleton and Ned Belovance interview Anuj Tyagi about AI observability, covering the unique challenges of monitoring AI stacks versus traditional applications, the importance of tracking token costs, implementing guardrails, and how tools like Agent Gateways and MCP servers add new layers of complexity to observability.

Summary

The episode explores how AI observability differs fundamentally from traditional application monitoring. Anuj Tyagi, drawing on experience since 2021 building MLOps pipelines and observability for AI products, explains that while traditional monitoring focuses on latency, CPU, memory, and database queries, AI stacks introduce entirely new concerns: token consumption costs, hallucination detection, model drift, prompt routing accuracy, and GPU performance for local models.

A significant portion of the discussion focuses on guardrails — the mechanisms used to prevent misuse of AI systems. Anuj describes how Agent Gateways act as proxies that intercept all inputs and outputs, making them ideal enforcement points for policies like blocking PII, preventing prompt injection, and enforcing RBAC. He references Microsoft's Presidio library for PII detection and notes that MCP servers can also function as guardrail proxies within IDEs like Kiro and Cursor. Kyler shares a real-world example of guardrails backfiring when a legitimate developer workflow to bypass MFA in dev environments kept getting blocked by an overzealous guardrail.

The conversation addresses the growing financial pressure around LLM token costs, which Kyler colorfully dubs the 'tokenpocalypse.' Anuj notes that even metadata fetching in MCP tool schemas consumes thousands of tokens, meaning costs scale non-linearly as AI features mature. He and Ned discuss model routing strategies — dynamically sending prompts to cheaper models when full capability isn't needed — as a cost management technique. Anuj also observes that organizations often discover expensive loops or runaway agent behavior only after receiving surprise bills, reinforcing the need for proactive monitoring.

The episode draws a broader parallel between the evolution of AI stacks and the historical progression from bare metal servers to containers to Kubernetes to service meshes — each layer adding complexity and requiring dedicated operational discipline. The hosts conclude that as AI tooling matures and formalizes, AI observability responsibilities will increasingly fall to generalist DevOps engineers rather than niche AI specialists.

About this episode

As AI matures, it becomes increasingly important to know how it’s performing and what it actually costs. Ned and Kyler are joined by Anuj Tyagi, Senior Site Reliability Engineer for RingCentral, to discuss the critical shift toward AI observability. AI observability is not just about costs; Anuj breaks down why observability has to include agent<a class="excerpt-read-more" href="https://packetpushers.net/podcasts/day-two-devops/d2do304-observability-in-the-age-of-ai/" title="ReadD2DO304: Observability in the Age of AI">... Read more »</a>

Key Insights

Anuj argues that AI observability must track not just standard metrics like latency and errors, but also token consumption, hallucination rates, prompt routing accuracy, and GPU performance for local models — dimensions that don't exist in traditional application monitoring.
Anuj claims that Agent Gateways acting as proxies are the optimal enforcement point for guardrails because they intercept all inputs and outputs, enabling centralized policy enforcement, RBAC, and observability via OpenTelemetry.
Kyler notes that unlike traditional APIs which return 4xx/5xx errors on failure, LLMs return HTTP 200 responses even when hallucinating, meaning 'success' at the protocol level tells you nothing about response quality.
Anuj observes that MCP tool schema metadata fetching alone consumes thousands of tokens, meaning AI cost scaling at production is far more aggressive than prototype-stage testing suggests.
Anuj argues that tracing longer-than-expected response times is one observable signal that correlates with hallucination, since uncertain or confused model states tend to produce slower, more erratic outputs.
Anuj describes building a library that rephrases prompts containing secrets or tokens rather than simply removing them, because outright removal can break context and cause incorrect LLM responses — a nuanced guardrail design tradeoff.
Kyler raises the concern that AI agents stuck in routing loops can burn through their entire token budget rapidly, making loop detection and retry limits a critical guardrail category distinct from content-based restrictions.
Anuj draws a parallel between AI stack maturation and the historical DevOps progression from monoliths to containers to Kubernetes to service meshes, arguing that the same pattern of layered complexity requiring dedicated operational discipline is now repeating with AI infrastructure.

Topics

AI observability vs. traditional application monitoringToken cost tracking and the 'tokenpocalypse'Guardrails for AI systemsAgent Gateways as observability and security proxiesMCP server monitoring and tool usage trackingModel routing and cost optimizationHallucination detection and non-deterministic system measurementMaturation of AI stacks and DevOps parallels

Transcript

. Welcome to Day 2 DevOps, where the dev oops is in the details. I'm Kyler Middleton and I'm joined by my convivial host, Ned Belovance. Hey, Ned. Hey, Kyler. Today, we're discussing AI observability, and specifically, we're exploring how AI is maturing, and observability is a big part of that. We're also going to talk about how tracking AI costs is becoming a pressing concern, especially with the coming tokenpocalypse, which is a new term I just invented. And also that monitoring and observing AI is more than just your LLM consumption. There's also things like agent gateways, MCP servers, and local models. Guiding us through all of that is Anuj Chiagi. Let's get to it. Welcome, Anuj Chiagi,…

Full transcript available for MurmurCast members

View original source →

More from The Everything Feed - All Packet Pushers Pods

Get AI summaries like this delivered to your inbox daily

D2DO304: Observability in the Age of AI

Summary

About this episode

Key Insights

Topics

Transcript

More from The Everything Feed - All Packet Pushers Pods

HW083: Inside the WLAN Pros Toolbox – A Free, Multipurpose App

NB582: Infoblox Adds Network Observability with Kentik Buy; Satellite Data Centers vs. the Environment

TCG079: Why Your State File is Actually a Distributed Systems Problem

NAN126: Fine-Tuning Open Source LLMs for Network Engineering

D2DO306: Platform Engineering in the Agentic Era (Sponsored)

Get AI summaries delivered to your inbox