Etched - Building AI Hardware to Make Inference Faster and Cheaper - [Invest Like the Best, EP.480]
Etched founders Gavin Uberti and Rob Lockett discuss building the first AI inference chip by a post-ChatGPT startup, their architectural innovations in low-voltage inference and cluster-scale memory, and their philosophy of velocity, vertical integration, and aggressive risk-taking to capture what they believe will become the largest market in the world.
Summary
Etched is a semiconductor company founded in 2023 by Gavin Uberti and Rob Lockett that has built a complete inference solution—chip, board, power delivery, interconnects, and manufacturing—rather than just a chip alone. The company has raised $800 million and secured over $1 billion in customer demand by taping out a working chip on their first attempt, a feat many industry experts said was impossible for young founders.
The founders encountered significant skepticism early on, with established semiconductor veterans claiming that building competitive AI chips required 40-50 years of experience in the industry. However, they realized that much of the semiconductor and data center industry was built with general-purpose constraints that no longer applied to modern AI inference workloads. Their key technical insight was that previous architectures were designed before ChatGPT and could be fundamentally reimagined for new workloads.
Etched's architecture rests on two primary technical bets: (1) Low-voltage inference, where they run chips at under half the voltage of competing AI chips by solving the thermal problem that causes other chips to throttle, and (2) Cluster-scale memory, where they built a custom interconnect stack that reduces chip-to-chip latency from 4,000 nanoseconds (on NVIDIA Blackwell) to approximately 800 nanoseconds, allowing much more effective use of memory across clusters. These innovations stem from understanding that prefill (processing context) and decode (generating tokens) have different optimization requirements.
The company's operational philosophy emphasizes velocity, vertical integration, and parallelization. Rather than outsourcing major components, they built racks, cold plates, networking, production facilities, and even software stacks in parallel with chip development. They maintained 24/7 development cycles with day and night shifts, sent a dozen engineers to Bangalore for six months to unblock a vendor relationship, and ran coordinated 12-hour handoffs across time zones. They spent aggressively on pre-fetching work—building FPGAs to validate the full chip design, creating thermal mockups before chips arrived, and setting up production lines in advance—to achieve 40-day chip-to-inference time versus 10 months for competitors.
Early fundraising was extremely difficult. Every major Silicon Valley investor passed on their initial pitch for a $100 million Series A, citing skepticism about young founders, the unproven inference market, and the inherent risks. The founders faced a moment where the math didn't close and they considered dropping out and returning to Harvard. They ultimately assembled the funding through a combination of debt and rolling commitments from individual investors who believed in the market and team, eventually closing a 103-million-dollar Series A through persistent outreach.
Their team-building philosophy pairs industry legends (people who have shipped at massive scale, like NVIDIA's Brian Loyler who built the HGX and DGX systems) with what they call "chips on shoulders"—young, intensely driven people like Sanford who won robotics competitions as a two-person team and have a hunger to prove themselves. This bimodal approach provides both credibility and scrappy innovation.
The founders believe inference will become the largest market in the world as AI models serve billions of users simultaneously, running multi-month or multi-year agent tasks. They see economies of scale eventually reaching gigawatt-scale facilities and trillion-dollar data centers, with token production becoming a fundamental measure of national productivity and capacity. They emphasize that hardware designed pre-ChatGPT cannot efficiently serve modern workloads and that the next decade will see entirely new architectures emerge.
On models themselves, they argue that machines don't think like brains do, and that future architectures will exploit this difference by using vast amounts of compute, very large context windows (potentially billions of tokens), mixture-of-experts approaches, and dynamic computation allocation. They expect long-horizon agent tasks requiring billions of concurrent agents working 24/7, which will require hardware specifically designed for such workloads.
The founders emphasize that production is the product—their advantage comes not just from superior architecture but from the ability to manufacture at scale. They've invested heavily in supply chain partnerships, particularly with TSMC, and deliberately chose to build on different nanometer nodes than competitors to avoid zero-sum competition for wafer capacity.
About this episode
My guests today are Gavin Uberti and Rob Wachen, the founders of Etched. A few years ago, when they set out to build a better AI chip than the largest companies in the world, almost everyone I called told me it could not be done. They have since done it, taping out a working chip on their first attempt and becoming the first hardware company founded after ChatGPT to do so. They already have more than a billion dollars of customer demand for their first product, and have raised eight hundred million dollars to build it. Etched builds chips and systems designed to run AI models faster and at lower cost. They started the company in 2023, and that product is a complete rack for inference, the chip along with the boards, the power delivery, the interconnects, and the manufacturing to produce it all. We talk about the technical bets behind their architecture, how they hired industry legends and paired them with elite 22 year-olds, and why they believe inference will become one of the largest markets in the world. I think you will find the story of what they have built hard to forget. Please enjoy my conversation with Gavin and Rob. For the full show notes, transcript, and links to mentioned content, check out the episode page here. ----- Become a Colossus member to get our quarterly print magazine and private audio experience, including exclusive profiles and early access to select episodes. Subscribe at colossus.com/subscribe. ----- Ramp’s mission is to help companies manage their spend in a way that reduces expenses and frees up time for teams to work on more valuable projects. Go to ramp.com/invest to sign up for free and get a $250 welcome bonus. ----- Trusted by thousands of businesses, Vanta continuously monitors your security posture and streamlines audits so you can win enterprise deals and build customer trust without the traditional overhead. Invest Like the Best listeners get a special offer of $1,000 off Vanta when you go to vanta.com/invest. ----- WorkOS is the infrastructure B2B and AI-native companies use to sell to enterprise. It covers everything enterprise security requires: SSO, SCIM, RBAC, Audit Logs, AI governance, and more. Trusted by 2,000+ fast-growing companies, including OpenAI, Anthropic, Cursor, and Vercel. ----- Rogo is the AI platform for finance. They're building agents for Wall Street that are trained to understand how bankers and investors actually do work: from diligence and modeling, to turning analysis into deliverables. To learn more, visit rogo.ai/invest. ----- Ridgeline has built a complete, real-time, modern operating system for investment managers. It handles trading, portfolio management, compliance, customer reporting, and much more through an all-in-one real-time cloud platform. Visit ridgelineapps.com. ----- Editing and post-production work for this episode was provided by The Podcast Consultant. Timestamps: (00:00:00) Welcome to Invest Like The Best (00:02:07) Gavin Uberti and Rob Wachen (00:03:54) Two 21-Year-Olds Taking on NVIDIA (00:07:52) The Two Technical Bets Behind Their Architecture (00:14:15) Why Inference Becomes the Biggest Market (00:20:23) Rob and Gavin's Origins Stories (00:28:38) How They Recruit Industry Legends (00:36:30) Moving a Dozen Engineers to Bangalore for Six Months (00:38:01) Speed Wins (00:43:58) Getting More Concurrency Out of Every Megawatt (00:52:44) Vertical Integration (00:57:43) Hardest Obstacles to Overcome (01:01:09) Raising The Largest AI Chip Series A Ever (01:06:29) TSMC (01:13:20) Designing Gen 2 for Gigawatt-Scale Production (01:16:42) Why Machines Don't Think Like People (01:20:03) A Year of Compute Compressed Into a Month (01:23:44) The Trillion-Dollar Data Center (01:26:19) The Kindest Thing
Key Insights
- The semiconductor industry's standard practices were built for general-purpose use cases across IoT, edge computing, and data centers, creating unnecessary constraints for specialized AI inference workloads that can be relaxed when designing for a specific use case.
- The founders discovered that running AI chips at lower voltages than GPUs is physically possible (evidenced by Bitcoin miners operating at a quarter GPU voltage) but GPU architectures have fundamental issues preventing them from operating safely at low voltages—a problem Etched solved through architectural innovation.
- Achieving high clock speeds on chips requires solving the thermal problem first through low-voltage design; adding more flops without solving thermal throttling provides no actual performance gains because the chip will self-regulate and reduce clock speed under heat.
- Cluster-scale memory bandwidth is poorly utilized in current GPU setups because the latency to access memory across chips is extremely high (4,000 nanoseconds on Blackwell), making it impractical to treat a cluster's memory as a single unified pool despite having sufficient bandwidth.
- The company spent aggressively on parallel development (FPGA validation, thermal mockups, production line setup, software stacks) before silicon arrived to compress 10-month chip-to-inference timelines down to 40 days, demonstrating that capital spent on parallelization has massive ROI.
- Every major Silicon Valley investor passed on Etched's Series A pitch despite the founders' technical credentials and market opportunity, requiring them to assemble funding through debt plus rolling individual commitments from a few believers in the market.
- TSMC's value comes not from technical superiority alone but from exceptional customer service and willingness to run experiments on their own dime to optimize customer yields, demonstrating how supplier relationships become critical competitive advantages in hardware.
- The most difficult technical challenge during chip development was synchronizing two clock signals within 50 picoseconds (50 trillionths of a second) across 2 billion cycles per second to prevent incorrect results, a problem multiple team members initially believed was unsolvable until a creative solution emerged.
- Inference will shift focus from raw speed (which multiple chips can now achieve) to concurrency—how many users can be served simultaneously at a given quality level—making memory bandwidth and chip-to-chip interconnect latency the primary performance metrics rather than peak flops.
- The founders argue that machines don't think like human brains, and future AI systems will exploit this difference by using far more compute, massive context windows, mixture-of-experts architectures, and dynamic computation allocation rather than mimicking brain structure.
- Most AI chips built by hyperscalers (Google TPUs, Meta MTIA, Microsoft Maia, OpenAI Jalapeno) have lower flop density than Blackwell because those companies' revenues come from elsewhere and they can afford less risky, me-too products, whereas Etched's existence is entirely dependent on chip superiority.
- The founders believe inference will eventually become a larger market than training, with token production becoming a fundamental economic metric where inference capacity measured in agents per megawatt will determine a nation's effective workforce size and economic capability.
Topics
Transcript
I know firsthand how complex the tech stack is for asset managers. And seemingly every new tool and data source makes the problem even worse, adding more complexity, more headcount, and more risk. Ridgeline offers a better way forward, one unified platform that automates away all that complexity across portfolio accounting, reconciliation, reporting, trading, compliance, and more. All at scale. Ridgeline is revolutionizing investment management, helping ambitious firms scale faster, operate smarter, and stay ahead of the curve. See what Ridgeline can unlock for your firm. Schedule a demo at ridgelineapps.com. OpenAI, Cursor, Anthropic, Perplexity, and Vercel all have something in common. They all use WorkOS. And here's why. To achieve enterprise adoption at scale, you have to deliver on…
Full transcript available for MurmurCast members
Sign Up to AccessMore from Invest Like the Best with Patrick O'Shaughnessy
Vlad Barbalat - Investing $120 Billion in Permanent Capital - [Invest Like the Best, EP.479]
Vlad Barbalat, CIO of Liberty Mutual Investments' $120 billion platform, discusses how the mutual insurance structure enables unique long-term capital deployment, the importance of entrepreneurial culture in investing, and his journey from Soviet Moldova to building one of finance's most distinctive investment platforms.
Kareem Amin - Re-Enchanting the World - [Invest Like the Best, EP.478]
Patrick O'Shaughnessy interviews Kareem Amin, co-founder and CEO of Clay, a $4B software company. They discuss Clay's origin and growth, but spend most of the conversation exploring Amin's personal philosophy around courage, truth, justice, wholeness, risk, and what it means to build a company with integrity and self-awareness.
Darren Farber on Iran, China, and the Rise of Neoprimes - [Invest Like the Best, EP.474]
Darren Farber, managing partner of Albion River defense investment firm, discusses the Iran contingency, defining 'winning' in modern warfare, the state of the US military and industrial base, China's strategic weaknesses, and the rise of neoprime defense companies. He analyzes martyrdom cultures, magazine depth, procurement reform needs, and how AI disinformation could corrupt military decision-making systems.
Gavin Baker - Watts and Wafers - [Invest Like the Best, EP.473]
In this episode, Gavin Baker discusses the significant impacts of AI on the economy, emphasizing the importance of energy (watts) and semiconductor capacity (wafers) in shaping the future of AI technology. He shares unique insights on the valuations of AI companies, the dynamics of competition, and geopolitical implications.
Krishna Rao - Anthropic's CFO on Compute, Scaling to $30B ARR, and the Returns to Frontier Intelligence - [Invest Like the Best, EP.472]
Krishna Rao, CFO of Anthropic, discusses the company's compute strategy, explosive revenue growth from $9B to $30B ARR in a single quarter, and the thesis that returns to frontier AI intelligence are exceptionally high in enterprise. He explains how Anthropic uses three chip platforms fungibly, navigates a 'cone of uncertainty' in forecasting, and why the company's culture of intellectual humility and collaboration has been a key competitive advantage.