TechnicalDiscussion

FFmpeg: The Incredible Technology Behind Video on the Internet | Lex Fridman Podcast #496

Lex Fridman

Lex Fridman interviews Jean-Baptiste Kempf (president of VideoLAN, creator of VLC) and Kieran Kunhya (FFmpeg contributor) about the open source multimedia ecosystem powering the internet. They cover the technical depth of video codecs, the volunteer-driven community behind FFmpeg and VLC, the ethics of refusing millions in ad revenue, and the future of multimedia including ultra-low latency streaming for robotics.

Summary

The conversation opens with a deep technical walkthrough of what happens when you press play on a video: from URL resolution to demuxing, entropy decoding, intra prediction, inverse transforms, and finally pixel display. Jean-Baptiste and Kieran emphasize that every sentence of this process represents lifetimes of engineering work, and that video compression achieves 100x-1000x reduction by exploiting human perceptual limits rather than lossless mathematical precision — working in YUV color space rather than RGB because it matches how human eyes process luminance versus color.

The discussion covers the history of VLC, which originated as a student project at École Centrale Paris in the late 1990s called VideoLAN, designed to stream satellite TV over a campus network. Jean-Baptiste joined in 2003, saved the project from near-death in 2005 when only two developers remained, and built it into software downloaded over 6.5 billion times. He recounts repeatedly refusing tens of millions of dollars from shady ad companies and toolbar bundlers, explaining that he couldn't compromise the project's integrity or betray the volunteer community — even when the final offer framed the money as enabling future open source work.

Kieran explains the FFmpeg ecosystem: it's the foundational library for virtually all video processing on the internet, used by YouTube, Netflix, Chrome, Discord, OBS, and countless others. The project has had 2,000-3,000 contributors over its lifetime but is maintained by only 10-15 core developers. They discuss the meritocratic culture, where code quality is the only criterion regardless of the contributor's background, age, or employer — citing teenagers who have contributed thousands of lines of assembly code.

A major technical focus is the celebration of handwritten SIMD assembly code, particularly in the dav1d AV1 decoder, which contains 240,000 lines of handwritten assembly versus 30,000 lines of C — compared to all of FFmpeg's other codecs combined having only 100,000 lines of assembly. This assembly achieves 10x-62x speed improvements over C, and the dav1d project deliberately violates standard operating system calling conventions to squeeze out additional performance. They argue this matters because FFmpeg likely runs on hundreds of millions to a billion CPUs simultaneously, making every instruction cycle consequential.

The conversation addresses significant community tensions: Google's AI-generated security bug reports flooding volunteer maintainers, Microsoft Teams engineers treating FFmpeg's public bug tracker like a vendor SLA, and the broader problem of trillion-dollar corporations depending on unpaid volunteers without proportionate contribution. Jean-Baptiste notes that spicy tweets have actually produced positive results — increased donations, patches from Google, and broader awareness of open source infrastructure fragility.

They discuss the technical history of reverse engineering proprietary codecs like GoToMeeting, Windows Media, and RealMedia — with particular admiration for Kostya Shishkov, who reverse engineered 20-30 megabyte binary blobs that would normally take months per megabyte, using a 'binary specification' philosophy requiring no documentation. Kieran shares his own experience reverse engineering CineForm by finding samples with flat blocks to simplify the initial implementation.

The open source licensing discussion covers the spectrum from MIT to AGPL, explaining why Jean-Baptiste re-licensed libVLC's core from GPL to LGPL — requiring him to track down and contact over 350 contributors individually, including visiting a factory worker whose deceased son had written code. The social contract nature of open source licenses is emphasized as the only thing the diverse global community agrees on.

On the future, Jean-Baptiste's new startup Kyber targets ultra-low latency video streaming (goal: 4ms glass-to-glass) for robotics teleoperation, drones, and remote surgery, currently achieving 7ms. Both see multimedia expanding to point clouds, volumetric video, haptics, spatial audio, and eventually brain-computer interface codecs. The archiving community's use of FFmpeg as a Rosetta Stone for preserving video for a thousand years is highlighted, with the FFV1 lossless codec developed specifically for that purpose.

Key Insights

  • Jean-Baptiste argues that dav1d's 240,000 lines of handwritten assembly — compared to only 100,000 lines for all other FFmpeg codecs combined — was necessary because the Alliance for Open Media originally claimed AV1 was too complex for software decoding and required hardware, which dav1d disproved, enabling AV1 to run on 3 billion devices with one or two CPU cores for 720p.
  • Kieran argues that Google's AI-generated security reports constituted an effective denial-of-service attack on volunteer maintainers, because the reports were extremely wordy, marked everything as high priority, targeted obscure 1990s game codecs, and were announced publicly to media before patches could be developed — creating massive burden without proportionate contribution of patches or funding.
  • Jean-Baptiste explains that x264's success in producing higher visual quality than industry encoders came not from better mathematics but from hobbyists ignoring the 'holy' PSNR metric and instead optimizing by eye on laptops, developing psychovisual rate-distortion and adaptive quantization techniques that industry rejected because they worsened mathematical scores despite looking dramatically better.
  • Jean-Baptiste recounts that to re-license libVLC's core from GPL to LGPL, he had to personally track down and obtain agreement from over 350 contributors — including traveling to a factory to meet a worker whose deceased son had written code — because in open source joint works, copyright is retained by every individual contributor even after their code is deleted or superseded.
  • Kieran claims that dav1d violates standard operating system calling conventions within its own library — creating a custom convention to avoid saving registers to L1/L2 cache when calling internal functions — a technique he states he has never heard of any other project using at mass scale, and which is only possible because the assembly is handwritten rather than compiler-generated.

Topics

Video codec fundamentals and compression theoryFFmpeg and VLC open source ecosystemHandwritten SIMD assembly code optimizationOpen source licensing and community governanceRefusing commercialization to preserve open source integrityReverse engineering proprietary codecsAV1 and dav1d decoder developmentMaintainer burnout and AI-generated bug reportsUltra-low latency streaming for roboticsMultimedia archival and preservation

Full transcript available for MurmurCast members

Sign Up to Access

Get AI summaries like this delivered to your inbox daily

Get AI summaries delivered to your inbox

MurmurCast summarizes your YouTube channels, podcasts, and newsletters into one daily email digest.