TechnicalDiscussion

HS130: Wait, AI Doesn’t Secure Itself? Developing an AI Security Strategy

John Attil Johnson and John Burke of Nemertes discuss why AI systems require dedicated security strategies beyond standard enterprise protections. They cover specific AI threat vectors including LLM jacking, data poisoning, sleeper agent attacks, and prompt injection, while arguing that zero trust principles and robust data stewardship are foundational to any AI security posture.

Summary

The episode opens with hosts John Attil Johnson and John Burke challenging the assumption that enterprise leaders — even CISOs and CTOs — have fully internalized the need to secure AI systems. Burke argues that while some practitioners are aware of threats like prompt injection, broader leadership tends to focus on strategic AI adoption rather than protection. Johnson acknowledges that in his own conversations with CTOs pushing aggressive AI adoption, the security question has rarely been raised organically.

The hosts reference two key external frameworks: HarmBench (harmBench.org), which catalogs ways AI can cause harm beyond cybersecurity — including reputational damage, legal liability, and harmful outputs — and OWASP's LLM Top 10, a security risk list distinct from their standard web application risks, because AI introduces unique wrinkles on classic vulnerabilities like insufficient input sanitization.

A central threat discussed is LLM jacking, described as the AI equivalent of a botnet. Automated crawlers on the internet scan for exposed LLM API endpoints and MCP endpoints, aggregate them, and resell access on dark web marketplaces — with 'Bizarre Bazaar' cited as a specific example. Bad actors can then use victims' API access (e.g., a company's ChatGPT subscription) to run queries at the victim's expense, such as generating phishing emails. Burke frames this as a straightforward failure to apply zero trust network access principles to AI infrastructure.

The conversation then moves through the AI lifecycle, starting with training-time threats. Data poisoning is discussed in two forms: untargeted poisoning (randomly corrupting training data to degrade model accuracy) and targeted poisoning (a manufacturer bribing a developer to insert false purchase records so an AI recommends their products more favorably). Burke extends this to inference-time external poisoning, citing a BBC reporter who successfully convinced multiple major LLMs — including ChatGPT and Gemini — that he was a world-record hot dog eater, by placing false information in a single blog post. This took roughly 20 minutes.

Prompt injection is discussed as another inference-phase attack, ranging from social media 'ignore all previous instructions' tricks to potentially triggering dangerous system commands on agentic AI systems with command-line access. Burke describes IBM security researchers who successfully 'hypnotized' LLMs eight layers deep, causing them to invert all answers and deny doing so even when directly confronted.

Sleeper agent attacks are presented as a more advanced training-phase threat: embedding a pre-programmed response to a specific trigger phrase, analogous to Cold War spy activation codes. In an agentic AI context with system-level access, such an attack could allow an adversary to exfiltrate conversation history or execute destructive commands.

For defenses, Burke consistently returns to zero trust principles, role-based access control with least-privilege for AI agents, input/output filtering gateways, context isolation between conversations, session time-to-live limits on external inputs, and treating all external data as data to be analyzed rather than commands to be obeyed. Johnson raises the operational challenge of enforcing role-based access at scale when AI agents can be spawned in large numbers, noting that traditional directory structures were built for humans and a finite set of roles — a gap he describes as currently unsolved.

The hosts argue that hallucinations complicate defense because it is functionally impossible to distinguish between a benign hallucination, a harmful hallucination, and a deliberate sleeper-agent-triggered output — and that hallucinations are intrinsic to LLMs and will never be fully eliminated without destroying the model's utility.

On the organizational side, the hosts identify the full C-suite — CTO, CSO, CISO, CRO — plus AI development teams, network teams, and cybersecurity teams as all needing to be engaged. They recommend framing risks in concrete, business-oriented language (e.g., 'our GPT bill increased tenfold by theft') and using real-world news stories to make abstract threats tangible. Johnson emphasizes that a robust information stewardship program — covering data provenance, change control, retirement, and quality — is a prerequisite for safe AI deployment, comparing the current moment to the data scrubbing efforts enterprises attempted 20 years ago but abandoned.

Key Insights

  • Burke argues that LLM jacking — where exposed AI API and MCP endpoints are harvested by automated crawlers and resold on dark web marketplaces like 'Bizarre Bazaar' — is essentially the AI equivalent of a botnet, and results directly from failure to apply zero trust network access to AI infrastructure.
  • Burke distinguishes between untargeted data poisoning (randomly corrupting training data) and targeted poisoning, such as a manufacturer paying to insert false purchase records so an AI recommends their products more favorably to an online retailer.
  • Burke cites a BBC reporter who convinced ChatGPT, Gemini, and other major LLMs in roughly 20 minutes that he held a world record for hot dog consumption, by inserting false information into a single blog post — demonstrating how easily inference-phase external poisoning can occur.
  • Burke describes IBM security researchers who successfully embedded a multi-layer 'hypnosis' attack eight levels deep into LLMs, causing models to invert all their answers and deny doing so even when directly and repeatedly told to stop.
  • Burke argues that hallucinations are intrinsic to LLMs and will never be eliminated — and that this makes it functionally impossible to reliably distinguish between a benign hallucination, a harmful hallucination, and a deliberate sleeper-agent-triggered output.
  • Johnson raises the concern that role-based access control for AI agents is an operationally unsolved problem, because traditional directory structures were built for humans and finite roles — not for agents being spawned at scale and at machine speed.
  • Burke argues that for agentic AI, role-based access control and least-privilege principles must apply to agents just as they do to humans, and that for extremely high-risk operations — like modifying core infrastructure — a human-in-the-loop requirement should remain mandatory.
  • Johnson argues that a robust information stewardship program — covering data provenance, change control, audit trails, and data retirement — is a prerequisite for safe AI deployment, and that most enterprises abandoned serious data governance efforts years ago and have not returned to it.

Topics

LLM jacking and dark web API resaleData poisoning attacks (training-time and inference-time)Prompt injection and sleeper agent attacksZero trust principles applied to AI infrastructureAgentic AI and MCP security risksInformation stewardship as a foundation for AI securityOrganizational responsibility for AI security strategyHallucinations as a complicating factor in threat detection

Full transcript available for MurmurCast members

Sign Up to Access

Get AI summaries like this delivered to your inbox daily

Get AI summaries delivered to your inbox

MurmurCast summarizes your YouTube channels, podcasts, and newsletters into one daily email digest.