Key Takeaways

  • Notion’s internal “Boxy” system integrates an AI agent called Codeex to automate code generation and bug fixes directly from Notion tasks.
  • Engineers describe a task with natural language and a screenshot, @mention Codeex, and the AI produces a complete pull request with UI verification and a preview URL.
  • Ryan Nystrom recounts a “tab block” request that went from prompt to full PR, including implementation, in under 20 minutes, even uploading its own UI verification screenshots.
  • This AI-driven workflow drastically reduces the mental overhead of coding and traditional code reviews, letting engineers focus on higher-level problem-solving.

The Method: Notion's Boxy-Powered Code Engine

Imagine describing a UI component or a bug fix in a few sentences, hitting enter, and receiving a complete pull request with working code and visual verification in under twenty minutes. That's the reality Ryan Nystrom and the team at Notion built with their internal “Boxy” system, powered by an AI agent they call Codeex.

The process starts simply enough. An engineer—or even a manager who wants to stay hands-on without getting bogged down in boilerplate—writes out a task in a Notion page. Say, building a new “tab block” component. Ryan Nystrom recounts this exact scenario: “A friend of mine, um who's a notion fan text me. He's like, 'Hey, I like the tab block that you built.'” Nystrom's thought? “Oh yeah, that that sounds really easy.”

He opened a Notion task, jotted down some notes, dropped in a screenshot for visual context, and then @mentioned Codeex. From there, the magic happens. Codeex, integrated with virtual machines, takes the natural language specification, generates the necessary code, runs its own UI verification, and even creates a preview URL.

Nystrom details the rapid turnaround: “10:40 10:51 started the implementation and then another 10 minutes later it replies with a pull request link and a preview URL because we like we do the um like preview environment stuff and it like built the entire thing.” The truly mind-bending part? It doesn't just build it; it proves it works. “This was actually the coolest part to me is like it actually uploaded screenshots of it doing its own like UI verification.”

This isn't just about speed; it's about shifting cognitive load. Claire Vo points out a key benefit: “Your AI, your agent is never going to complain when you ask it to do this five minutes before the meeting starts.” It frees humans from the drudgery and context switching of basic coding tasks, letting them review, refine, and tackle the truly complex problems.

Where This Breaks Down

While Notion's Boxy system sounds like a developer's dream, it's not a silver bullet. This approach thrives on well-defined, somewhat constrained problems like UI components or isolated bug fixes. The "tab block" example is perfect because it's a common pattern with clear visual expectations.

This method would likely struggle with highly abstract architectural decisions, complex system integrations, or novel algorithms where the "solution" isn't easily inferred from natural language and a screenshot. The AI is building based on a relatively clear existing knowledge base and a robust CI/CD environment. Without a strong testing infrastructure and preview environments, the AI's output becomes harder to verify, shifting the burden back to human engineers.

Ultimately, the AI excels at execution once the problem is well-specified. It doesn't replace the initial human insight, creativity, or the deep understanding required to architect truly new systems. It takes care of the grunt work, but the strategic direction remains firmly in human hands.

What to Do With This

Stop thinking of AI as just a coding assistant; start thinking of it as an autonomous developer for discrete tasks. Your goal this week is to identify one small, repetitive coding task—a new button style, a minor text change, a common bug fix—that currently takes 30+ minutes of context switching and manual coding. Draft a natural language spec for it, including a screenshot if applicable.

Then, instead of immediately coding it, use an existing AI tool (like GitHub Copilot or Cursor, which both offer increasingly robust chat interfaces) to generate the pull request. Don't just paste code snippets; try to push the AI to generate the full solution, including tests or UI verification steps if possible. The point isn't perfection, but to train yourself and your team to think in "specs for AI" rather than "lines of code." This practice accelerates your mental model for a future where your engineers write more prompts than code.