InsightfulTechnical

Etched - Building AI Hardware to Make Inference Faster and Cheaper - [Invest Like the Best, EP.480]

Invest Like the Best with Patrick O'ShaughnessyJune 30, 20261h 27m

Etched founders Gavin Uberti and Rob Lockett discuss building the first AI inference chip by a post-ChatGPT startup, their architectural innovations in low-voltage inference and cluster-scale memory, and their philosophy of velocity, vertical integration, and aggressive risk-taking to capture what they believe will become the largest market in the world.

Summary

Etched is a semiconductor company founded in 2023 by Gavin Uberti and Rob Lockett that has built a complete inference solution—chip, board, power delivery, interconnects, and manufacturing—rather than just a chip alone. The company has raised $800 million and secured over $1 billion in customer demand by taping out a working chip on their first attempt, a feat many industry experts said was impossible for young founders.

The founders encountered significant skepticism early on, with established semiconductor veterans claiming that building competitive AI chips required 40-50 years of experience in the industry. However, they realized that much of the semiconductor and data center industry was built with general-purpose constraints that no longer applied to modern AI inference workloads. Their key technical insight was that previous architectures were designed before ChatGPT and could be fundamentally reimagined for new workloads.

Etched's architecture rests on two primary technical bets: (1) Low-voltage inference, where they run chips at under half the voltage of competing AI chips by solving the thermal problem that causes other chips to throttle, and (2) Cluster-scale memory, where they built a custom interconnect stack that reduces chip-to-chip latency from 4,000 nanoseconds (on NVIDIA Blackwell) to approximately 800 nanoseconds, allowing much more effective use of memory across clusters. These innovations stem from understanding that prefill (processing context) and decode (generating tokens) have different optimization requirements.

The company's operational philosophy emphasizes velocity, vertical integration, and parallelization. Rather than outsourcing major components, they built racks, cold plates, networking, production facilities, and even software stacks in parallel with chip development. They maintained 24/7 development cycles with day and night shifts, sent a dozen engineers to Bangalore for six months to unblock a vendor relationship, and ran coordinated 12-hour handoffs across time zones. They spent aggressively on pre-fetching work—building FPGAs to validate the full chip design, creating thermal mockups before chips arrived, and setting up production lines in advance—to achieve 40-day chip-to-inference time versus 10 months for competitors.

Early fundraising was extremely difficult. Every major Silicon Valley investor passed on their initial pitch for a $100 million Series A, citing skepticism about young founders, the unproven inference market, and the inherent risks. The founders faced a moment where the math didn't close and they considered dropping out and returning to Harvard. They ultimately assembled the funding through a combination of debt and rolling commitments from individual investors who believed in the market and team, eventually closing a 103-million-dollar Series A through persistent outreach.

Their team-building philosophy pairs industry legends (people who have shipped at massive scale, like NVIDIA's Brian Loyler who built the HGX and DGX systems) with what they call "chips on shoulders"—young, intensely driven people like Sanford who won robotics competitions as a two-person team and have a hunger to prove themselves. This bimodal approach provides both credibility and scrappy innovation.

The founders believe inference will become the largest market in the world as AI models serve billions of users simultaneously, running multi-month or multi-year agent tasks. They see economies of scale eventually reaching gigawatt-scale facilities and trillion-dollar data centers, with token production becoming a fundamental measure of national productivity and capacity. They emphasize that hardware designed pre-ChatGPT cannot efficiently serve modern workloads and that the next decade will see entirely new architectures emerge.

On models themselves, they argue that machines don't think like brains do, and that future architectures will exploit this difference by using vast amounts of compute, very large context windows (potentially billions of tokens), mixture-of-experts approaches, and dynamic computation allocation. They expect long-horizon agent tasks requiring billions of concurrent agents working 24/7, which will require hardware specifically designed for such workloads.

The founders emphasize that production is the product—their advantage comes not just from superior architecture but from the ability to manufacture at scale. They've invested heavily in supply chain partnerships, particularly with TSMC, and deliberately chose to build on different nanometer nodes than competitors to avoid zero-sum competition for wafer capacity.

About this episode

My guests today are Gavin Uberti and Rob Wachen, the founders of Etched. A few years ago, when they set out to build a better AI chip than the largest companies in the world, almost everyone I called told me it could not be done. They have since done it, taping out a working chip on their first attempt and becoming the first hardware company founded after ChatGPT to do so. They already have more than a billion dollars of customer demand for their first product, and have raised eight hundred million dollars to build it. Etched builds chips and systems designed to run AI models faster and at lower cost. They started the company in 2023, and that product is a complete rack for inference, the chip along with the boards, the power delivery, the interconnects, and the manufacturing to produce it all. We talk about the technical bets behind their architecture, how they hired industry legends and paired them with elite 22 year-olds, and why they believe inference will become one of the largest markets in the world. I think you will find the story of what they have built hard to forget. Please enjoy my conversation with Gavin and Rob. For the full show notes, transcript, and links to mentioned content, check out the episode page ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠here⁠⁠⁠⁠⁠. ----- Become a Colossus member to get our quarterly print magazine and private audio experience, including exclusive profiles and early access to select episodes. Subscribe at ⁠colossus.com/subscribe⁠. ----- ⁠Ramp’s⁠ mission is to help companies manage their spend in a way that reduces expenses and frees up time for teams to work on more valuable projects. Go to⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ⁠ramp.com/invest⁠⁠ to sign up for free and get a $250 welcome bonus. ----- Trusted by thousands of businesses, ⁠Vanta⁠ continuously monitors your security posture and streamlines audits so you can win enterprise deals and build customer trust without the traditional overhead. Invest Like the Best listeners get a special offer of $1,000 off Vanta when you go to ⁠vanta.com/invest⁠. ----- WorkOS⁠ is the infrastructure B2B and AI-native companies use to sell to enterprise. It covers everything enterprise security requires: SSO, SCIM, RBAC, Audit Logs, AI governance, and more. Trusted by 2,000+ fast-growing companies, including OpenAI, Anthropic, Cursor, and Vercel. ----- Rogo is the AI platform for finance. They're building agents for Wall Street that are trained to understand how bankers and investors actually do work: from diligence and modeling, to turning analysis into deliverables. To learn more, visit rogo.ai/invest. ----- ⁠Ridgeline⁠ has built a complete, real-time, modern operating system for investment managers. It handles trading, portfolio management, compliance, customer reporting, and much more through an all-in-one real-time cloud platform. Visit⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ridgelineapps.com⁠. ----- Editing and post-production work for this episode was provided by The Podcast Consultant. Timestamps: (00:00:00) Welcome to Invest Like The Best (00:02:07) Gavin Uberti and Rob Wachen (00:03:54) Two 21-Year-Olds Taking on NVIDIA (00:07:52) The Two Technical Bets Behind Their Architecture (00:14:15) Why Inference Becomes the Biggest Market (00:20:23) Rob and Gavin's Origins Stories (00:28:38) How They Recruit Industry Legends (00:36:30) Moving a Dozen Engineers to Bangalore for Six Months (00:38:01) Speed Wins (00:43:58) Getting More Concurrency Out of Every Megawatt (00:52:44) Vertical Integration (00:57:43) Hardest Obstacles to Overcome (01:01:09) Raising The Largest AI Chip Series A Ever (01:06:29) TSMC (01:13:20) Designing Gen 2 for Gigawatt-Scale Production (01:16:42) Why Machines Don't Think Like People (01:20:03) A Year of Compute Compressed Into a Month (01:23:44) The Trillion-Dollar Data Center (01:26:19) The Kindest Thing

Key Insights

The semiconductor industry's standard practices were built for general-purpose use cases across IoT, edge computing, and data centers, creating unnecessary constraints for specialized AI inference workloads that can be relaxed when designing for a specific use case.
The founders discovered that running AI chips at lower voltages than GPUs is physically possible (evidenced by Bitcoin miners operating at a quarter GPU voltage) but GPU architectures have fundamental issues preventing them from operating safely at low voltages—a problem Etched solved through architectural innovation.
Achieving high clock speeds on chips requires solving the thermal problem first through low-voltage design; adding more flops without solving thermal throttling provides no actual performance gains because the chip will self-regulate and reduce clock speed under heat.
Cluster-scale memory bandwidth is poorly utilized in current GPU setups because the latency to access memory across chips is extremely high (4,000 nanoseconds on Blackwell), making it impractical to treat a cluster's memory as a single unified pool despite having sufficient bandwidth.
The company spent aggressively on parallel development (FPGA validation, thermal mockups, production line setup, software stacks) before silicon arrived to compress 10-month chip-to-inference timelines down to 40 days, demonstrating that capital spent on parallelization has massive ROI.
Every major Silicon Valley investor passed on Etched's Series A pitch despite the founders' technical credentials and market opportunity, requiring them to assemble funding through debt plus rolling individual commitments from a few believers in the market.
TSMC's value comes not from technical superiority alone but from exceptional customer service and willingness to run experiments on their own dime to optimize customer yields, demonstrating how supplier relationships become critical competitive advantages in hardware.
The most difficult technical challenge during chip development was synchronizing two clock signals within 50 picoseconds (50 trillionths of a second) across 2 billion cycles per second to prevent incorrect results, a problem multiple team members initially believed was unsolvable until a creative solution emerged.
Inference will shift focus from raw speed (which multiple chips can now achieve) to concurrency—how many users can be served simultaneously at a given quality level—making memory bandwidth and chip-to-chip interconnect latency the primary performance metrics rather than peak flops.
The founders argue that machines don't think like human brains, and future AI systems will exploit this difference by using far more compute, massive context windows, mixture-of-experts architectures, and dynamic computation allocation rather than mimicking brain structure.
Most AI chips built by hyperscalers (Google TPUs, Meta MTIA, Microsoft Maia, OpenAI Jalapeno) have lower flop density than Blackwell because those companies' revenues come from elsewhere and they can afford less risky, me-too products, whereas Etched's existence is entirely dependent on chip superiority.
The founders believe inference will eventually become a larger market than training, with token production becoming a fundamental economic metric where inference capacity measured in agents per megawatt will determine a nation's effective workforce size and economic capability.

Topics

AI chip architecture and designLow-voltage inference technologyCluster-scale memory and interconnectsVertical integration in hardwareStartup fundraising and capital requirementsTeam building and recruitingManufacturing and supply chainFuture of AI inference and agent computingOperational velocity and parallelizationRisk-taking and problem-solving culture

Transcript

I know firsthand how complex the tech stack is for asset managers. And seemingly every new tool and data source makes the problem even worse, adding more complexity, more headcount, and more risk. Ridgeline offers a better way forward, one unified platform that automates away all that complexity across portfolio accounting, reconciliation, reporting, trading, compliance, and more. All at scale. Ridgeline is revolutionizing investment management, helping ambitious firms scale faster, operate smarter, and stay ahead of the curve. See what Ridgeline can unlock for your firm. Schedule a demo at ridgelineapps.com. OpenAI, Cursor, Anthropic, Perplexity, and Vercel all have something in common. They all use WorkOS. And here's why. To achieve enterprise adoption at scale, you have to deliver on…

Full transcript available for MurmurCast members

View original source →

More from Invest Like the Best with Patrick O'Shaughnessy

Get AI summaries like this delivered to your inbox daily

Etched - Building AI Hardware to Make Inference Faster and Cheaper - [Invest Like the Best, EP.480]

Summary

About this episode

Key Insights

Topics

Transcript

More from Invest Like the Best with Patrick O'Shaughnessy

Vlad Barbalat - Investing $120 Billion in Permanent Capital - [Invest Like the Best, EP.479]

Kareem Amin - Re-Enchanting the World - [Invest Like the Best, EP.478]

Darren Farber on Iran, China, and the Rise of Neoprimes - [Invest Like the Best, EP.474]

Gavin Baker - Watts and Wafers - [Invest Like the Best, EP.473]

Krishna Rao - Anthropic's CFO on Compute, Scaling to $30B ARR, and the Returns to Frontier Intelligence - [Invest Like the Best, EP.472]

Get AI summaries delivered to your inbox