TechDwarkesh Podcast

How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope

With Dwarkesh Patel, Reiner Pope · Sunday, April 26, 2026

Reiner Pope, CEO of MatX, breaks down the intricate details of how large language models like GPT-5, Claude, and Gemini are trained and served in cluster environments. He explains the critical role of batch size, mixture of experts, and parallelism in managing latency and cost, linking these technical elements to real-world AI API pricing structures. The discussion also ventures into the physical constraints of GPU rack design and the surprising architectural parallels between cryptographic protocols and neural networks.

Watch on YouTube ↗More from Tech →

How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope

Crypto's Feistel Networks Slash AI Training Memory Costs

GPU Racks: The Hidden 8x Bottleneck in Your LLM

LLM API Costs: Context Length Reveals Memory Bottlenecks

LLM Batch Size: 1000x Cheaper Inference or Crushing Latency?

Pope: Pipelining Cuts LLM Weights, Not KV Cache Memory

LLM MoE Scaling: Your GPU Rack is The Hard Limit