You run a command and watch the agent draft a clean-looking plan. You greenlight the execution, step away for coffee, and return twenty unsupervised minutes later to find a 4,000-line pull request spanning 40 different files.
Your job just shifted from creative builder to exhausted code reviewer who can’t feasibly review everything line by line.
We've been building toward a practical answer: two skills called visual-plan and visual-recap (plus the open-source framework to support them) that bring structure and a verifiable contract back to the runtime.
You can ask your agent to drop them into your workflow today, and you don't need to know anything else.
But if you're interested in the structural reasons why we built them, read on.
In this shift from writing code to auditing it, we've crossed an architectural line without naming it: we've started treating the plan as source code and the agent as the compiler. Just like we stopped reading assembly once we trusted C, we've stopped reading the actual code.
This makes your taste and your judgment the ultimate bottleneck. But it also introduces a massive risk.
Because C compilers are deterministic. Compile the same C twice, you get the same binary. But hand the same plan to an LLM twice, and you get two completely different codebases: different patterns, different dependencies, and different bugs.
Our new compiler is probabilistic. It will place an auth guard on the wrong side of a boundary, generate an unnecessary database column, and degrade a core query—all with absolute confidence. When you skim a massive markdown file and hit approve, you aren't blessing a plan. You're signing off on an uncompiled, unpredictable binary. We're shipping the largest volume of unreviewed, non-deterministic assembly in history, and we're calling it velocity.
To stop shipping blind binaries, we have to change how we interact with these loops.
If you watch how these tools operate, they address a massive asymmetry in how we build agentic systems. We pour endless effort into optimizing the machine's context: building MCP servers, vector pipelines, and tightly pruned token windows. But look at what we send back to the human: three screens of unformatted terminal logs.
| Machine Context (Obsessively Optimized) | Human Context (An Afterthought) |
Structured tools and schemas | Three screens of raw markdown |
Pruned, relevant token windows | Ephemeral terminal logs |
An indexed, retrievable repo | Cognitive skimming |
Your judgment is the most expensive resource in an autonomous loop. Starving your own context is a massive bottleneck. Human eyes aren't built to spot an architectural flaw in a wall of sequential terminal output. A missing loading state, a leaked database relation, or a duplicate component built from scratch—you'd catch these in a second if you saw them.
But a wall of text is hostile. Your eyes glaze over, you hit enter to approve, and you spend the next hour debugging what you could have caught in three seconds if the format had been better.
This context gap completely changes how we have to approach the planning phase. If the agent is going to write all the code, and we’re not really going to change it directly, then you have to verify the agent’s intent before it starts. Because of that, a plan needs to act as an official contract. You aren't reviewing code; you’re acting as the runtime validator for a chaotic compiler.
This is what visual-plan actually is: a semantic interface for verifying a probabilistic compile. A visual plan—showing the actual wireframe, the real shape of the API, or how an empty list state renders—lets you catch alignment issues before the agent writes a single line of code. It shifts your role from babysitting a terminal to steering with intent.
But for a contract to hold, it can't be written in slippery, freeform prose. That's why these plans use MDX instead of standard markdown. It's not for decoration; it's for schema enforcement.
Freeform text has infinite entropy. Left to itself, an agent will wander, omit critical details, or hallucinate schemas. By forcing the agent to output typed UI components, we build a playground with strict walls. A <DataModel> block must declare keys and relations. An <Endpoint> block must explicitly define its auth strategy. The agent can no longer hide a missing guard behind confident prose; it either fills out the schema or leaves it conspicuously blank. This is type-safety for natural language.
It's generative UI in the developer loop. The agent assembles real, design-system-compliant primitives instead of hallucinating raw markdown. Plans from different models read like they were written by the same engineer. And because these plans are actual MDX files checked into your Git history, they don’t have to be ephemeral. They’re versioned, collaborative contracts you can comment on, redline, and edit directly.
Securing the upstream contract is only half the battle. You still have to verify the delivery at the other end of the loop. That's where our traditional tools break down.
For twenty years, the Git diff and the pull request have been our primary governance tools. But when an agent drops 4,000 lines of code into a branch, line-by-line review breaks down. We can't audit at the line level anymore. We have to audit at the contract level: did the agent build what we agreed to in the plan, and only that?
visual-recap solves this. Instead of a vague markdown summary or an unreadable 3,000-line diff, it lifts the changes back into the components defined in the plan. It exposes the real schema changes, the endpoints modified, and the UI states introduced.
Now, governance is simple: do the plan and the recap align? If they drift, the build is broken, even if the test suite passes. Because they share a structured schema, we can run an agent to flag the drift automatically, letting you focus your attention on the only question that requires a human: was this drift a smart pivot or a hallucination?
A recap is still a generated summary, so it earns trust the way a summary has to: it's generated from the actual PR or branch diff, not the agent's say-so, and it points straight at the ground truth—the changed lines and the deployed preview. Read the recap to know where to look; click through to confirm.
When you step back and look at this entire lifecycle, from structured plan to verified recap, you realize the whole shape of our job has shifted.
We've spent two years treating prompt engineering like a master craft—writing longer instructions, expanding context windows, and refining system prompts. But you don't steer an autonomous agent by prompting it louder. You steer it by engineering the environment: schemas, types, AST-level linter rules, and visual contracts. Code generation is solved; constraint engineering is the new bottleneck.
| Writing Code | Designing Systems (Constraint Engineering) |
Hand-writing syntax | Engineering deterministic constraints |
Auditing line-by-line diffs | Auditing high-level visual contracts |
Babysitting terminal state | Commanding sandboxed execution loops |
The right-hand column is what we can't automate. Someone still has to decide if the system being built is the right system, shaped the right way. That is taste. It's our scarcest resource, and you can only apply it as well as your interfaces allow. If we keep giving humans three screens of markdown, we're starving the only part of the loop that doesn't scale.
This shift from prompt tuning to environment design isn't just a footnote about better tooling. It's the next logical step in how we build with agents. First, we isolated execution so a runaway loop couldn't brick your machine (worktrees, containers, sandboxes). Now, we are isolating human attention so the loop's output doesn't drown us (structured, component-driven plans and recaps instead of walls of text).
Agent Experience (AX) is becoming a strict branch of systems engineering. The interface between human and agent is now core infrastructure. It’s just as critical as your type system or test suite.
This model isn't for every task. When you're actively pairing with an agent, iterating on UI, or experimenting with code, a heavy planning phase just slows you down. But for autonomous, long-running execution loops, a structured plan is your only lever for control.
Agents will keep getting better at writing code; that's a safe bet. The bottleneck is moving entirely to you—your judgment, and whether you have the visibility to exercise it. Stop squinting at terminal scrollback and give yourself an interface built for the job.
Find more of our open-source, agent-native tooling here.