Key Takeaways
- The next frontier for AI isn't just co-pilots, but AI automating scientific research itself, starting with understanding machine learning models.
- Gray Swan co-founder Zico Kolter believes AI agents will transform mechanistic interpretability (mechan) from an "ad hoc" process into a true science by automating hypothesis testing.
- The historical barrier to deeply secure software wasn't technical impossibility, but human limitations in patience and manpower for obscure, formally verified languages.
- Coding agents can overcome these human constraints, enabling the routine generation of highly secure code that would be too complex or tedious for human engineers.
- This shift allows founders to tackle problems previously deemed too complex or labor-intensive for human teams, opening new avenues for product innovation and security.
Your Next Research Assistant is an AI Agent
For years, the dream of making machine learning models truly understandable—a field called mechanistic interpretability, or "mechan"—has felt more like art than science. It's been a slow, manual grind of hypothesis, test, and often, frustration. But Zico Kolter, co-founder of AI security company Gray Swan, sees a coming revolution. He believes coding agents will pull "mechan" out of its ad hoc state and finally make it a true science.
“I am newly optimistic or I should say more optimistic about mechan in that I think actually as with many things coding agents have a chance to make this into a science,” Kolter states. His point is simple: the issue wasn't a lack of tools or theoretical understanding. As Kolter notes, “It wasn't that mechan was was just impossible. We have all the tools we need. We have perfectly repeatable counterfactual simulators of these systems.” The real bottleneck? “The problem was we didn't have enough patience or manpower to actually run all these things together.”
Think about that. The ceiling for progress in understanding complex AI wasn't intelligence; it was endurance. Kolter goes further, suggesting this isn't just about interpretability, but a broader shift: “Maybe the first science we should automate is the science of interpretability, the science of analyzing machine learning itself... That's AI for science. Let's use AI to automate that kind of science.” Agents can perform “experimentation in an automated meta fashion,” rapidly iterating on ideas that would exhaust human researchers. This capability changes how we approach fundamental scientific inquiry.
Unbreakable Code, No Human Required
This same logic applies to writing ironclad, secure code. For decades, computer science has offered "formally verified languages"—obscure, mathematically rigorous tools that can produce software with provable security guarantees. The catch? They’re incredibly difficult and time-consuming for humans to write in. As a result, most critical software isn't built this way, leaving gaping security holes.
But what if AI agents took over? Gray Swan's other co-founder, Matt Fredrikson, zeroes in on this idea. He says, “If agents are... if Claude and Codex are writing our code for us... if they turn out to be good at writing this kind of code, then that isn't a concern. Why not just write it in one of these obscure languages as long as the agent is smart enough to do it?”
This isn't just about writing faster code. It’s about achieving a level of security and correctness that humans simply can’t sustain at scale. An AI agent doesn't get bored. It doesn't make typos. It doesn't lose patience with a formal proof system. By outsourcing the grunt work of highly precise, formally verified code generation to agents, we unlock a new tier of software reliability and defense against increasingly sophisticated attacks.
What to Do With This
Stop thinking of AI as merely a productivity tool for human tasks. Instead, identify a "science" within your own business that’s currently ad hoc or bottlenecked by human patience and scale. Can AI agents automate complex hypothesis testing, rigorous analysis, or the generation of precise, formally verified components for your core product? Start experimenting with agents to tackle the deeply complex, often tedious, tasks that currently limit your team’s ability to achieve true scientific rigor or bulletproof security in your software.