TechnicalDiscussion

@Asianometry & Dylan Patel — How the semiconductor industry actually works

Dwarkesh PatelOctober 2, 2024

Dylan Patel and John from Asianometry discuss the semiconductor supply chain complexity, China's AI scaling capabilities, and the massive capital investment required to build competitive AI clusters. They analyze how geopolitical tensions, chip manufacturing constraints, and data center buildouts will shape the competitive landscape through 2028-2029.

Summary

The conversation covers multiple interconnected topics in semiconductors and AI infrastructure. Dylan and John begin by discussing how talent flows drive technological progress in semiconductors, using historical examples like Richard Chang and Lee Naongsung who transferred expertise from TSMC to Samsung and SMIC respectively. They explain that semiconductor manufacturing is deeply compartmentalized with apprentice-master knowledge transfer systems, particularly in Taiwan through universities like NTU and National Chinghua.

The hosts analyze China's path to competing in AI, noting that despite export controls on chips, China could theoretically centralize compute resources and build massive data centers. They calculate that with current sanctions allowing over a million H20 GPUs annually, plus domestic chips like Huawei's Ascend 910B, China could construct a 1.3 trillion parameter model by 2026 if it chooses centralization. The key bottleneck is not chips but centralized decision-making—something the US avoids through distributed innovation across OpenAI, Anthropic, Google, and Meta.

On US AI infrastructure, they estimate that as of 2025, clusters of 300,000-500,000 GPUs are being built across multiple sites, equivalent to roughly 1 million H100s. By 2026-2027, gigawatt-scale data centers will emerge, with Microsoft, OpenAI, and others signing $100+ billion in fiber deals to connect regional data centers. They project that by 2028-2029, single-site or loosely-coupled multi-site clusters could reach millions of chips—potentially supporting training runs for 1.3 trillion parameter models.

The economics discussion reveals that GPU costs dominate total cost of ownership (80%), while power represents only 10%, contradicting the narrative that power availability limits data center buildout. The real constraint is the industrial capacity to deliver substations, transformers, and power generation infrastructure. China has massive advantages here due to continuous renewable energy and manufacturing buildouts.

On chip manufacturing bottlenecks, SMIC's Shanghai fab has ~25,000-35,000 wafers per month of leading-edge capacity despite possessing 45-50 immersion lithography tools, limited by yields and lack of EUV technology. Yet SMIC still produces millions of usable chips annually for Huawei phones and accelerators. The hosts argue export controls are somewhat effective but face fundamental limits—restricting one jigsaw puzzle piece only accelerates China's development of alternatives.

Capital dynamics reveal that OpenAI is expected to raise $50-100 billion in 2025, enabling continued scaling. They argue this isn't purely speculative because the revenue lag between capex spending and returns is tolerable for companies like Microsoft (which views this as a strategic investment), and because GPT-4's actual training cost (~$500 million) generated billions in recurring revenue, validating the ROI logic for 5x larger models.

The hosts discuss architectural divergence: Chinese AI systems will optimize differently than US systems due to memory bandwidth and compute constraints from the H20, potentially leading to different model architectures (wider networks, different attention mechanisms) compared to US solutions optimized for Nvidia's H100/Blackwell. They also compare the semiconductor industry's extreme siloing—where no individual knows the whole stack—to AI development's more open innovation model, and discuss how AI tools could revolutionize semiconductor design by tackling massive optimization search spaces.

Finally, Dylan and John share their origin stories: John started Asianometry as a travel vlog in 2017, gradually pivoting to business history and semiconductors over three years with minimal viewership, eventually becoming a major educational channel. Dylan was a hobbyist who reverse-engineered Xbox hardware at age 8, spent a decade obsessively studying semiconductor supply chains online and at conferences, then began consulting in 2020, eventually building Semi Analysis into a firm with customers including hyperscalers, semiconductor companies, and hedge funds.

Key Insights

China could build a 1.3 trillion parameter model by 2026 if it centralized compute, because over a million H20 GPUs plus 600,000+ Ascend 910B chips annually exceed the 100,000-GPU cluster sizes currently operating in the US, but decentralization through multiple Chinese companies prevents this centralization.
GPU costs represent 80% of total data center ownership costs, while power is only 10%, meaning power availability is not the real constraint—instead, the bottleneck is industrial capacity to build transformers, substations, and power generation infrastructure.
SMIC's Shanghai fab produces approximately 25,000-35,000 wafers per month of leading-edge capacity despite having 45-50 advanced lithography tools, limited by poor yields from lack of EUV access, yet still manufactures millions of usable chips annually for Huawei.
Semiconductor knowledge is siloed through apprentice-master systems where specialists like etch engineers know only their domain deeply, preventing any single person from understanding the entire manufacturing stack, unlike AI where research breakthroughs are publicly shared.
By 2028-2029, million-chip clusters supporting 1.3 trillion parameter models are plausible if US labs continue 10x year-over-year scaling, with Microsoft's planned 5-gigawatt multi-region cluster and OpenAI's expected $50-100 billion funding round enabling this expansion.

Topics

Semiconductor supply chain complexity and compartmentalizationChina's AI scaling capabilities and centralization potentialUS AI infrastructure buildout and cluster scaling timelineExport controls effectiveness and semiconductor manufacturing constraintsCapital requirements and financial ROI models for AI scalingData center power and infrastructure limitationsTSMC, SMIC, and Samsung competitive dynamicsArchitectural divergence between US and Chinese AI systemsTalent flows and knowledge transfer in semiconductor industryAI's potential to revolutionize chip design and manufacturing

Transcript

[0:00] Song is a nut. He's like, "We will make Samsung into this monster." He does not care about people. He does not care about business. He wants to take it to the limit. The only thing there's no [ __ ] way you can pay for the scale of clusters that are being planned to be built next year for OpenAI unless they raise like 50 to$100 billion. Hold on. We've already lost John. We've already accepted GPD5 would be good. But yeah, you got it. You know, you got it. Yeah. Like, bro, like life is so much more fun when you just like are delusionally like we're just ripping bongs, are we? We're not even close to…

Full transcript available for MurmurCast members

View original source →

More from Dwarkesh Patel

Get AI summaries like this delivered to your inbox daily

@Asianometry & Dylan Patel — How the semiconductor industry actually works

Summary

Key Insights

Topics

Transcript

More from Dwarkesh Patel

The reason Russia and China can't win at sea - Sarah Paine

The One Job AI Can't Replace, According to @3blue1brown

Grant Sanderson (@3blue1brown) – AI and the future of math

Renaissance art was a weapon - Ada Palmer

What sanctions are actually designed to do - Sarah Paine

Get AI summaries delivered to your inbox