OpenAI's 'Evals Crisis': Why Your AI Benchmarks Are Broken
OpenAI's Mark Chen on the 'evals crisis' for superhuman AI. Learn why traditional benchmarks fail and how OpenAI separates eval creation from model optimization to avoid self-deception.
40 hours of podcasts, in 5 minutes.
OpenAI's Chief Research Officer, Mark Chen, joins Alessio Fanelli for a cooking session while discussing OpenAI's core research strategies. Topics include the ongoing relevance of scaling laws, the 'evals crisis' in AI, the future trajectory of AGI development, and how aspiring researchers can develop crucial 'research taste' within the field.
OpenAI's Mark Chen on the 'evals crisis' for superhuman AI. Learn why traditional benchmarks fail and how OpenAI separates eval creation from model optimization to avoid self-deception.
OpenAI's Mark Chen says AGI is near, pushing human researchers into 'vibe' roles, orchestrating models with 'good taste' across unified architectures.
Forget the PhD. OpenAI's Mark Chen says the best way to develop AI 'research taste' is to replicate published papers. Get his concrete method here.