Key Takeaways
- AI, particularly formal systems like those based on Lean, struggles with true mathematical discovery—the creative process of formulating conjectures and constructing examples before a formal proof begins.
- Axiom Math actively invests in “mathematical discovery” tools to help humans explore and find creative constructions (like sequences or graphs), acknowledging this pre-proof phase is a crucial, human-driven step AI can't replicate.
- Autoformalization, the process of translating informal problem statements into precise, verifiable code, remains a highly challenging problem due to the inherent ambiguity and lack of clear grounding in human language.
- Humans are inherently poor at fully specifying their intent for complex systems, leaving critical gaps that prevent formal verification and lead to unforeseen issues.
- The future of coding will involve interactive AI acting as a “specification proposal” co-pilot, generating test cases and edge scenarios to force humans to clarify their fuzzy requirements into provable statements.
AI's Blind Spot: True Discovery vs. Formal Proof
Carina Hong, CEO of Axiom Math, sees a clear division in what AI can do well and where it falls short. While systems like Axiom Math's can achieve "superhuman performance" in proving established mathematical theorems, they hit a wall when it comes to the very human act of creation. “A Lean-based system will struggle in those very creative places,” Hong explains. This isn't about solving known problems faster; it's about finding the problems themselves.
Hong calls this missing piece "mathematical discovery." It's the stage where mathematicians don't yet know what they want to prove. Instead, they're exploring, constructing examples—like writing out the first few terms of a sequence or drawing various graphs—to understand properties and formulate conjectures. This is the messy, intuitive, imaginative work that precedes the rigorous process of formal proof, and it's where Axiom Math is actively building tools to augment human intelligence, aiming to scale "brilliance" not just correct errors.
The Hard Problem of Autoformalization and Human Specification Failure
Beyond the creative void, AI also stumbles hard at autoformalization: converting a human's informal problem statement into a precise, unambiguous formal specification that a computer can verify. This isn't just a technical hurdle; it’s a reflection of a deeper human limitation. As Hong candidly puts it, “Humans are bad at specifying everything that we want.” We leave gaps, we make assumptions, and if it's not explicitly specified, it simply cannot be proven.
This challenge becomes acute when building complex software. Imagine specifying a distributed system; the sheer number of implicit assumptions and desired behaviors makes full formalization nearly impossible for humans alone. While formal verification has theoretical limits (like Rice's Theorem, which implies you can't automatically prove all properties of all programs), the practical challenge often lies in our inability to articulate our own intentions clearly enough for any system to verify.
An Interactive Future: AI as Your Specification Co-Pilot
Hong envisions a future where AI steps in not as a replacement for human creativity or a perfect autoformalizer, but as an interactive partner in specification. She calls this the "future of coding." Instead of AI just generating code, it will play a crucial role in clarifying our intent.
Picture this: you sketch out a high-level plan for a distributed system. An AI system could then break it down into components, and for certain parts, it could call Axiom Math to generate formally verified code. But critically, the AI would also generate "specification proposals." This could involve proposing test cases, edge scenarios, or even adversarial inputs that challenge your initial, informal definition. “It is basically giving you the specification proposal,” Hong says. This iterative feedback loop helps humans fill in the gaps in their own understanding and intent, making the informal formal, one test case at a time.
What to Do With This
This week, before your next product spec or engineering sprint, assign a team member (or a prompt for GPT-4) to generate 5-10 'malicious' or edge-case test cases against your initial informal specification. This forces you to formalize intent where you inevitably left gaps, mimicking how future AI will help bridge the chasm between human ideas and provable systems.