TechnicalResearch

Eric Jang – Building AlphaGo from scratch

Dwarkesh PodcastMay 15, 20262h 37m

Eric Jang discusses the construction of AlphaGo from scratch, exploring its implications for AI research and development, particularly in game-playing AI and deep reinforcement learning. He emphasizes the significance of combining neural networks with Monte Carlo Tree Search (MCTS) to achieve superior performance in complex environments like Go.

Summary

Eric Jang shares his experiences in rebuilding and enhancing AlphaGo, originally a groundbreaking project by DeepMind. He explains the complexities of the game Go, its high computational demands, and the critical breakthroughs in deep learning that made effective game AI possible. Jang highlights how MCTS and neural networks synergize to evaluate board positions and formulate strategies, enabling an AI to learn and adapt in real-time through a structured search process. He notes recent advances in AI that allow for significant cost reductions in training powerful Go bots like Catago, suggesting that the efficiencies gained in algorithms and compute resources could transform future AI research. Moreover, the discussion touches on the prospects of automated AI research and its potential to accelerate discovery in various fields, drawing parallels with current LLM research and the implications of model architectures in efficiency and effectiveness. Ultimately, he advocates for the exploration of Go as a versatile testing ground for such AI innovations.

About this episode

<a href="https://evjang.com/" target="_blank">Eric Jang</a> walks through how to build AlphaGo from scratch, but with modern AI tools.Sometimes you understand the future better by stepping backward. AlphaGo is still the cleanest worked example of the primitives of intelligence: search, learning from experience, and self-play. You have to go back to 2017 to get insight into how the more general AIs of the future might learn.Once he explained how AlphaGo works, it gave us the context to have a discussion about how RL works in LLMs and how it could work better – naive policy gradient RL has to figure out which of the 100k+ tokens in your trajectory actually got you the right answer, while AlphaGo’s MCTS suggests a strictly better action every single move, giving you a training target that sidesteps the credit assignment problem. The way humans learn is surely closer to the second.Eric also kickstarted an Autoresearch loop on his project. And it was very interesting to discuss which parts of AI research LLMs can already automate pretty well (implementing and running experiments, optimizing hyperparameters) and which they still struggle with (choosing the right question to investigate next, escaping research dead ends). Informative to all the recent discussion about when we should expect an intelligence explosion, and what it would look like from the inside.Watch on <a href="https://youtu.be/X_ZVSPcZhtw" target="_blank">YouTube</a>. Read the <a href="https://www.dwarkesh.com/p/eric-jang" target="_blank">transcript</a>.And check out the <a href="https://flashcards.dwarkesh.com/eric-jang/" target="_blank">flashcards</a> I wrote to retain the insights.Sponsors* <a href="https://cursor.com/dwarkesh" target="_blank">Cursor</a>‘s agent SDK let me build a pipeline to generate flashcards for this episode. For each card, I had an agent read the transcript, ingest blackboard screenshots, generate an SVG visual, and run everything through a critic. A durable agent is much better at this kind of work than a chain of LLM calls, and Cursor’s SDK made it easy. Check out the cards at <a href="https://flashcards.dwarkesh.com" target="_blank">flashcards.dwarkesh.com</a> and get started with the SDK at <a href="https://cursor.com/dwarkesh" target="_blank">cursor.com/dwarkesh</a>* <a href="https://janestreet.com/dwarkesh" target="_blank">Jane Street</a> gave me a real deep-dive tour of one of their datacenters. I got to ask a bunch of questions to Ron Minsky, who co-leads Jane Street’s tech group, and Dan Pontecorvo, who runs Jane Street’s physical engineering team. They were willing to literally pull up the floorboards and take out racks to explain how everything works. Check out the full tour at <a href="https://janestreet.com/dwarkesh" target="_blank">janestreet.com/dwarkesh</a>Timestamps(00:00:00) – Basics of Go(00:08:17) – Monte Carlo Tree Search(00:32:04) – What the neural network does(01:00:33) – Self-play(01:25:38) – Alternative RL approaches(01:45:47) – Why doesn't MCTS work for LLMs(02:01:09) – Off-policy training(02:12:02) – RL is even more information inefficient than you thought(02:22:16) – Automated AI researchers Get full access to Dwarkesh Podcast at <a href="https://www.dwarkesh.com/subscribe?utm_medium=podcast&utm_campaign=CTA_4">www.dwarkesh.com/subscribe</a>

Key Insights

Eric Jang emphasizes that AlphaGo represented a significant advancement in AI by using deep learning to handle the complex search space of the game Go.
He points out how many past AI models depended on exhaustive search methods, which were computationally prohibitive for complex games like Go.
Jang notes that the implementation of MCTS combined with neural networks allowed AlphaGo to evaluate positions efficiently.
He highlights Catago's 40x reduction in compute requirements for training Go bots compared to earlier models like AlphaGo.
Jang discusses how the training process of Go AIs can be significantly accelerated by leveraging expert game data, enhancing the initial performance.
He argues that automated AI research can be improved through better identification of promising research paths based on historical success rates.
Jang describes how significant computational resources were needed in early models like AlphaGo, but modern tools allow for rapid experimentation with fewer resources.
He addresses the challenge of using reinforcement learning without being stuck in local minima due to poor initialization.
The unique capability of MCTS allows for local evaluation and iterative improvement of Go strategies, differentiating it from approaches in LLMs.
Jang asserts that LLMs do not easily translate MCTS-like search criteria due to their high-dimensional action spaces and complexity.
He mentions the importance of having a strong verification loop to assess the effectiveness of AI improvements.
Jang points to the role of scaling laws in training AI systems, suggesting these universal principles could apply to future optimizations.
He highlights the dual nature of MCTS and Q-learning, indicating potential insights for future reinforcement learning applications.
Jang notes the importance of soft labels in improving the effectiveness of training models, particularly through distillation.
He emphasizes that understanding how to verify the integrity of experimental ideas will bolster the efficacy of future AI researchers.

Topics

AlphaGoMonte Carlo Tree Search (MCTS)Artificial Intelligence ResearchNeural NetworksReinforcement Learning

Transcript

Today I'm here with Eric Zhang, who was most recently vice president of AI at 1x Technologies, before that senior research scientist at what is now Google DeepMind Robotics. And you've been on sabbatical for the last few months. One of the things you've been doing is rebuilding and improving and hacking on AlphaGo. And so today what we're going to do is you're going to explain building AlphaGo from scratch and what it tells us about the future of AI research and development. But before we get to that, why is AlphaGo interesting? Why is this the project you decided to do on sabbatical rather than just hanging out at the beach? Sure, yeah. I like making things and…

Full transcript available for MurmurCast members

View original source →

More from Dwarkesh Podcast

Get AI summaries like this delivered to your inbox daily

Eric Jang – Building AlphaGo from scratch

Summary

About this episode

Key Insights

Topics

Transcript

More from Dwarkesh Podcast

Grant Sanderson – AI and the future of math

The next big breakthrough will be AIs learning on the job

The data black hole at the center of AI

Ada Palmer – Machiavelli is the most misunderstood thinker of all time

Alex Imas and Phil Trammell – What remains scarce after AGI?

Get AI summaries delivered to your inbox