Key Takeaways
- Satya Nadella, Microsoft's CEO, argues that private evaluation methods (known as 'private evals') are rapidly becoming a company's most valuable intellectual property in the AI era.
- He proposes an 'acid test' for AI control: the ability to switch between different frontier models (e.g., Model A to Model B) and still see performance improvements using your unique evaluation sets.
- Companies should build an 'open platform' that allows various AI models to plug in, using their proprietary data, context, and tools to continually improve (or 'hill climb') their custom intelligence.
- This strategy keeps enterprises in control of their unique intelligence, prevents dependency on any single model vendor, and drives sustained value creation.
Private Evals: Your Deepest Moat
Forget obsessing over which large language model to pick. Microsoft CEO Satya Nadella has a blunt message for founders: your most important intellectual property in the AI age won't be the model itself, but your 'private evals.' These are the proprietary evaluation methods and datasets you use to test, refine, and improve AI performance for your specific business. “Every company having private evals maybe the biggest IP,” Nadella explained. "I think about it."
He isn't talking about generic benchmarks. This is about building highly specific, internal systems that understand your unique data, your customer's context, and your internal tools. Imagine having a suite of tests so precise, so tailored to your operations, that they alone can tell you if an AI is truly adding value or just spitting out plausible text. These evals capture your company's intelligence, making it distinct and defensible. Nadella added, “What's that private eval that you can then use even a frontier model to hill climb on and not leak the traces maybe one of the biggest drivers of IP.”
The “Acid Test” for AI Control
How do you know if you actually control your AI, or if you're just renting intelligence from a large model provider? Nadella offers a straightforward 'acid test.' He asks if you have an evaluation method that is so private and powerful that you can swap out the underlying AI model and still see gains. “You have an eval that's private. You're using a Model A. Can you switch it to Model B and you know, climb up? If you can, then you're in control. If you can't, you're not in control,” he said.
This isn't just theory. It's a pragmatic approach to avoid vendor lock-in. If your improvements depend entirely on one model, you're tethered to that provider's roadmap, pricing, and capabilities. An open platform approach, where you can bring in different models and use your private evals to test them, gives you power. “Having an open harness, letting all models come in, having your evals, your contexts, your tools help you hill climb, I think is the skills that an AI native startup needs, a SaaS company needs, or every enterprise needs,” Nadella stated. This ensures your company's intelligence grows, independent of external model shifts, securing long-term value.
What to Do With This
Stop chasing the latest model announcement. Instead, identify one core business process where AI could drive a specific, measurable gain this quarter. Build a private evaluation set for that process: a small, proprietary dataset of inputs, expected outputs, and a quantitative metric to score AI performance. Then, experiment with two different frontier models against your private evals. This week, start mapping out those first few data points and your scoring criteria.