AlphaGo's Dual AI Brain: ResNets Beat Transformers Early

AlphaGo didn’t just win at Go; it rewrote the playbook for AI. But beyond the headlines, how did its "intuition" actually work? Eric Jang, who rebuilt AlphaGo from scratch, details a core insight: a two-network system that simulates human foresight and decision-making.

Key Takeaways

AlphaGo relies on two distinct neural networks: a value network to predict win/loss probability from a given board state, and a policy network to suggest optimal next moves.
For early-stage AI development with smaller datasets or limited compute, ResNets often outperform Transformers, offering more efficiency due to their inductive bias for local features.
Initializing these networks with supervised learning from expert human games was a critical bootstrap for early AlphaGo versions (like AlphaGo Lee) before they moved to pure self-play.
The policy network, when used directly, plays a “fast Go player that doesn't think in terms of reasoning steps,” highlighting the need for Monte Carlo Tree Search (MCTS) to add strategic depth.
Modern iterations like KataGo found that aggregating global features throughout the network was vital for connecting board-wide value, offering a "global sense" to the AI.

The Method

Imagine an AI that doesn’t just crunch numbers, but feels the game. That’s what AlphaGo's dual network architecture aimed to achieve, according to Eric Jang. It splits the problem of playing Go into two distinct, yet complementary, neural networks. “There are two networks,” Jang explains. “There is the value network, which takes in a state and predicts, am I going to win or lose? Then we have a policy network, which induces a distribution over good actions to take.”

The value network, in essence, provides an intuitive probability of success from any board position, like a seasoned human player who can instantly size up a game. The policy network, on the other hand, suggests the most promising next moves, a fast reflex action. When you combine these with Monte Carlo Tree Search (MCTS), where the AI simulates thousands of future games using these networks, AlphaGo gains a sophisticated reasoning ability. Jang notes that if you just take the policy network's raw suggestion, “it'll be a very fast Go player that doesn't think in terms of reasoning steps.”

A critical early decision was how to train these networks. The initial AlphaGo (known as AlphaGo Lee) didn't start from scratch. “The original AlphaGo paper... initialized this network with a supervised learning dataset of expert human play,” Jang says. This gave the AI a strong foundational understanding of good Go strategy, much like teaching a prodigy with a library of master games. Only later did versions move to "tabula rasa" learning, where the model taught itself entirely through self-play.

Crucially, Jang observed that for models operating with "small data regimes" or tighter computational budgets, architectures like ResNets frequently deliver more value than the more complex Transformers. "My experience is that ResNets still outperform Transformers and give you more bang for the buck at lower budgets," he states. This is due to ResNets' inductive bias, which makes them particularly good at processing local spatial features—a natural fit for board games like Go. Later work, such as KataGo, further enhanced this by "pooling together and aggregate global features throughout the network, to give the network a global sense of how to connect value from one side of the board to the other."

Where This Breaks Down

While this dual-network, human-initialized approach worked wonders for AlphaGo, it's not a silver bullet. The advantage of ResNets for lower budgets or smaller datasets diminishes rapidly as data scales. Transformers, with their attention mechanisms, tend to shine brighter in massive data regimes where they can learn complex, long-range dependencies more effectively. Relying on ResNets too long might cap your model's ultimate performance ceiling. Furthermore, the supervised learning initialization, while powerful, demands a high-quality, expert-curated dataset. If your problem domain lacks such data, a purely tabula rasa self-play approach might be your only option, but it will require significant computational resources and time to achieve a strong baseline.

What to Do With This

If you're building an early-stage AI product where data is scarce or compute is a bottleneck, don't immediately reach for the latest, largest Transformer model. Instead, follow Eric Jang's lesson: start by experimenting with simpler architectures like ResNets, especially if your problem involves processing local features (like image recognition or grid-based data). Simultaneously, if expert human data is available, even in limited quantities, use supervised learning to bootstrap your model's initial performance before attempting full self-play or reinforcement learning. This dual strategy can get you to a functional, performant AI faster, stretching your precious early resources further.