Crypto's Feistel Networks Slash AI Training Memory Costs
Reiner Pope reveals how cryptographic Feistel networks led to RevNets, cutting neural network memory use during training. Founders, see how architectural cross-pollination saves real costs.
40 hours of podcasts, in 5 minutes.
Reiner Pope, CEO of MatX, breaks down the intricate details of how large language models like GPT-5, Claude, and Gemini are trained and served in cluster environments. He explains the critical role of batch size, mixture of experts, and parallelism in managing latency and cost, linking these technical elements to real-world AI API pricing structures. The discussion also ventures into the physical constraints of GPU rack design and the surprising architectural parallels between cryptographic protocols and neural networks.
Reiner Pope reveals how cryptographic Feistel networks led to RevNets, cutting neural network memory use during training. Founders, see how architectural cross-pollination saves real costs.
MatX CEO Reiner Pope reveals the 8x slower inter-rack network speed bottlenecking LLMs. Understand how GPU rack design limits MoE scale-up and what to do.
Reiner Pope decodes LLM API pricing, showing how context length and input/output costs expose underlying hardware limits and memory tiers.
MatX CEO Reiner Pope reveals how LLM batch size can slash inference costs 1000x—or spike latency. Learn the formula to balance speed and spend.
MatX CEO Reiner Pope unpacks pipeline parallelism for LLMs. It slashes model weight memory but hits KV cache limits, making inference a 'no-brainer' but training a hard trade-off.
Reiner Pope of MatX reveals how Mixture of Experts (MoE) layers scale within a GPU rack but bottleneck across racks for LLMs.