AI Models Aren't the Product, Orchestration Is: Perplexity CEO Aravind Srinivas

Key Takeaways

Forget building the best AI model. Perplexity CEO Aravind Srinivas believes the underlying model is quickly becoming a utility. The real value for founders now sits in building sophisticated orchestration systems that make models useful, not just powerful.
Greg Brockman's observation that “the model is no longer the product” rings true because complex AI applications like Codex or Perplexity's own system are fundamentally orchestration layers. They don't just use a model; they run it within a larger context.
An agent harness acts as the rulebook for an AI agent's loop. It defines the skills, sub-agents, connectors, and tools an agent can access, turning raw model intelligence into specific, valuable actions. This is where proprietary advantage is built.
Perplexity differentiates by orchestrating not just tools and data, but other AI models — including competitors'. Their goal is to maximize “token value per watt per user,” treating compute and output as a precise economic problem.
This entire approach is captured in Aravind Srinivas's Orchestration Problem: Maximizing Token Value Per Watt Per User, a framework for balancing intelligence, privacy, and cost.

The Aravind Srinivas's Orchestration Problem: Maximizing Token Value Per Watt Per User

Objective 1: Accuracy & Intelligence: Max out on intelligence and accuracy by building giant data centers and spending a lot of power to run them. (This may miss out on privacy and costs.)

Objective 2: Privacy & Cost: Run everything locally to be good for privacy and cost. (This may not be frontier intelligence or accuracy.)

Solution: Sweet Spot Orchestration: Figure out a sweet spot. Use local models when necessary, use server-side models when necessary and orchestrate across local models and server-side models. Grounded in valuable personal context. Build a world-class harness that can even make an okayish model appear great and be able to use the right model for the right task and the right part of the task (sub-agents). Utilize the compute we all have in our own devices that doesn't need to be always on a server. This is an orchestration from a router, an awesome router, a master orchestrator router.

When This Works (and When It Doesn't)

Srinivas argues this framework works when the vision is a 24/7 AI without the prohibitive costs of always-on server-side compute. It shines in applications where intelligent balancing of local and server-side processing is key to maximizing the value of each output token relative to the power consumed. Think of it for products that need both privacy (local processing) and frontier intelligence (cloud models), or require continuous background operation without breaking the bank.

However, this approach introduces significant complexity. Building a "master orchestrator router" that can intelligently switch between models, manage local compute, and integrate diverse tools is a monumental engineering challenge. For simple, single-purpose AI tools, or those where compute cost is not a primary concern (e.g., highly specialized B2B solutions with low user volume), the overhead of this orchestration might outweigh its benefits. It's for founders aiming for broad consumer adoption or highly efficient enterprise use cases where cost and context are critical.

What to Do With This

Next week, evaluate your AI product strategy through the lens of Srinivas's framework. Instead of asking, "Which LLM should we use?" ask, "How do we build an orchestration layer that makes our chosen model(s) 10x more valuable?" Specifically, identify a recurring workflow your users perform. Can you design an agent harness with specific rules for how an AI agent should complete this workflow? Consider how you could incorporate local processing on the user's device for sensitive data or simple tasks (Objective 2), while routing complex, intelligence-heavy requests to powerful server-side models (Objective 1). Sketch out the "master orchestrator router" that would manage this handoff, ensuring it grounds the models in unique user context to maximize "token value per watt per user." Your goal isn't just to produce an answer, but the most valuable answer for the least power spent."