Key Takeaways

  • AI, particularly Lean-based systems, struggles with highly creative mathematical domains like combinatorics because the necessary steps are often too intuitive and "quite creative," according to Carina Hong.
  • The core barrier for AI in formal verification isn't a lack of processing power, but the "specification problem": humans consistently fail to precisely define complex tasks, leaving gaps for unproven assumptions.
  • Axiom Math invests in "mathematical discovery" tools to assist mathematicians in the "pre-conjecturing step," helping them generate examples and constructions to build intuition before formal proof.
  • These discovery tools aim to auto-formalize ideas and propose better specifications, grounding them with concrete “test cases input output pair,” bridging human intuition with rigorous proof systems.

The Method: Engineering Human-AI Collaboration for Proof

Carina Hong, CEO of Axiom Math, tackles a subtle but existential challenge for AI: it’s not just about proving things, it’s about knowing what to prove. She frames it less as a technical hurdle and more as a human one, describing how Axiom Math builds systems around our natural limitations.

First, Hong acknowledges the inherent limits of even advanced AI in creative tasks. AI can be superhuman at certain proofs, but when it comes to open-ended, creative math like combinatorics, current systems, even those built on Lean, hit a wall. “Olympia in math people are seeing commonars being a little bit more um tricky. Seems like the steps are quite creative,” Hong explains. The AI isn't struggling with computation; it’s struggling with the creative leap.

This leads to the heart of the issue: the "specification problem." Humans are messy. We want things, but we often can't define those wants with the surgical precision a formal proof system demands. Hong puts it plainly: “Humans are bad at specifying everything that we want. There's always like some sort of saying that we are not specified and if it's not specified, it's not proven.” This isn't just an inconvenience; it's a fundamental blocker for automated verification. If you can't tell the AI exactly what criteria it needs to meet, it can't deliver a guaranteed proof.

Axiom Math's response isn't to force humans to be better specifiers, but to build tools that help. They invest in "mathematical discovery" systems. Imagine a partner for the "pre-conjecturing step" – the fuzzy, intuitive phase where mathematicians brainstorm ideas and explore examples. Hong describes it: “The goal is for if you're a mathematician or you're a theoretical physicist and you have a problem that you would like to solve... it's a tool for for mathematicians to make mathematical discoveries.”

These discovery tools generate examples, construct new possibilities, and basically help a human's intuition solidify into a precise conjecture. The conjecture then becomes the target for formal proof. Hong sees this as an interactive loop, where “the conjecture is going to help with the specification and then the prover does the proof.” This process is often “grounded by test cases input output pair,” providing concrete examples that make the abstract specification tangible. Axiom also open-sources codebases to expand this shared understanding, aiming to make both the discovery and proof process more collaborative and less dependent on isolated flashes of human brilliance.

Where This Breaks Down

The "mathematical discovery" approach, while promising, isn't a silver bullet. The AI's role in the "pre-conjecturing step" is still limited by how well it can generate truly novel, unexpected ideas, rather than just variations on a theme. Real mathematical breakthroughs often come from seeing connections no prior data suggested, a skill AI doesn't yet have.

Critically, the system still needs “human to eyeball it.” The AI can propose specifications and generate examples, but a human must validate these. This means the bottleneck simply shifts from raw proof generation to discerning whether the AI's suggestions are actually correct or even interesting. If the human input is flawed, the formally verified output is still garbage. The system is only as good as the human intuition it augments, not replaces.

What to Do With This

If you're building a product that relies on users defining complex requirements, stop expecting them to nail it on the first try. Instead, build discovery tools directly into your product's onboarding or configuration flow. Can your app suggest specific parameters, generate examples based on early inputs, or present structured options that help a user articulate their desired outcome more clearly, before your system attempts to deliver it? Treat user input as a starting point for discovery, not a final specification.