AI's Next Edge: OPSD and 'Dreaming' for Continual Learning
Dwarkesh Patel reveals advanced AI techniques like On-Policy Self-Distillation and 'dreaming' that let models learn on the job and scale beyond fixed datasets.
40 hours of podcasts, in 5 minutes.
Dwarkesh Patel explores the current AI training paradigm, focusing on the "big research bet" on scaling RL in verifiable environments. He critiques its limitations in generalizing to real-world, non-grindable tasks and the inefficiency of current inference, advocating for advanced continual learning techniques like On-Policy Self-Distillation and "dreaming" to enable AIs to learn on the job and improve through broad deployment.
Dwarkesh Patel reveals advanced AI techniques like On-Policy Self-Distillation and 'dreaming' that let models learn on the job and scale beyond fixed datasets.
Dwarkesh Patel exposes a huge waste in AI: models forget in-context lessons. Is your product caught in the same 'ephemeral' learning trap?