TechDwarkesh Podcast

What rebuilding AlphaGo teaches us about self-play, RL, and future of LLMs - Eric Jang

With Dwarkesh Patel, Eric Jang · Sunday, May 17, 2026

Eric Jang discusses his experience rebuilding AlphaGo from scratch, detailing the intricacies of Monte Carlo Tree Search (MCTS) and neural network architectures. He explores AlphaGo's unique self-play reinforcement learning approach, contrasting it with LLM training methods, and delves into the philosophical implications of AI solving NP-hard problems. The episode concludes with insights into the current capabilities and limitations of using large language models for automating AI research.

Watch on YouTube ↗More from Tech →

AlphaGo: Deep Learning Crushed Go's "Intractable" Search

Eric Jang reveals how AlphaGo solved Go, a problem deemed intractable. Today, similar AI breakthroughs cost thousands, not millions.

Read article →

AlphaGo's NP-Hard Challenge: How AI Cracks 'Impossible' Problems

Eric Jang reveals AlphaGo's impact on computational complexity. Learn how AI solves NP-hard problems by seeing macro patterns, not micro details. Apply it to your business.

Read article →

AlphaGo Didn't Wait for Wins: It Learned Per-Move

AlphaGo’s MCTS provides continuous 'per-move' feedback, making it vastly more efficient than LLM RL’s sparse, end-of-trajectory rewards. Apply this to your product or team.

Read article →

AlphaGo's MCTS: Turn Any Decision Into a Predictable Game

Eric Jang explains AlphaGo's Monte Carlo Tree Search (MCTS) four-step process. Apply this AI technique to optimize sales funnels or product strategy.

Read article →

AlphaGo's Dual AI Brain: ResNets Beat Transformers Early

AlphaGo's dual network approach offers lessons for early AI. Eric Jang explains why ResNets outperform Transformers on smaller data, and how human data bootstraps AI.

Read article →