Video compression explained: H.264, AV1, ProRes | Lex Fridman Podcast
This podcast transcript covers video compression fundamentals, explaining codec and container combinations, IPB frame types, and the complexity of compression parameters. The speakers discuss how codecs like H.264, AV1, and ProRes serve different use cases, and highlight the remarkable engineering behind modern video streaming at scale.
Summary
The conversation begins by outlining the most common codec-container combinations in modern video: H.264 in MP4, AV1 in MP4/WebM, and ProRes in its own containers. ProRes is explained as Apple's editing-focused codec, originally designed for Final Cut Pro, optimized for fast decoding and seeking rather than distribution efficiency. Its key characteristic is being an 'intra-only' codec with no temporal compression, meaning every frame is self-contained.
The speakers then dive into IPB frame types. I-frames (keyframes) are complete, self-contained images similar to JPEGs. P-frames (predicted frames) reference previous frames and only store the differences, requiring access to a prior I-frame to decode. B-frames (bi-directionally predicted frames) are the most complex, capable of referencing both past and future frames, meaning the decoding order differs from the display order — a concept the speakers describe as 'mind-blowing.'
The concept of intra-refresh is introduced as an alternative to traditional I-frames, used in streaming platforms like Kyber, where an I-frame is gradually built up across the stream rather than sent as a discrete complete frame. The default GOP (Group of Pictures) size in FFmpeg for H.264 is noted as around 250 frames.
The discussion then turns to the enormous number of tunable parameters in video encoding — resolution, frame rate, codec choice, bitrate mode (constant bitrate vs. constant quality), QP values, GOP length, and B-frame counts. The speakers note that thousands of professionals at companies like YouTube, Netflix, and Meta are dedicated solely to optimizing these parameters for different content types and delivery scenarios.
Finally, YouTube is highlighted as a uniquely complex engineering challenge, requiring both upload encoding at massive scale and delivery across all resolutions with minimal latency — even for videos with very few viewers. The conversation closes with a historical anecdote about Google Video using VLC via ActiveX in Internet Explorer, contrasted with today's approach of compiling VLC and FFmpeg to WebAssembly to run inside JavaScript virtual machines in the browser.
Key Insights
- ProRes is an intra-only codec with no temporal compression, meaning every frame is a complete image, which makes it ideal for editing workflows where fast seeking and cutting are required — a fundamentally different use case than distribution codecs.
- B-frames can depend on frames that appear in the future in the stream, meaning the decoding order is not the same as the display order — the decoder must buffer and decode a future frame before it can decode the current B-frame.
- Intra-refresh, used in platforms like Kyber, eliminates discrete I-frames entirely by gradually building up a complete frame across the continuous stream, refreshing certain parts of the image over time.
- The speakers argue that thousands of employees at companies like YouTube, Netflix, and Meta are not writing codecs but are instead solely dedicated to finding the right encoding parameters for specific content types and delivery formats.
- The fact that decoders from different manufacturers across the world — one in the US, one elsewhere — can decode the same stream and produce bit-for-bit identical output is described as 'quite remarkable,' especially as codecs grow increasingly complex.
Topics
Full transcript available for MurmurCast members
Sign Up to Access