Builder.io
Builder.io
‹ Back to blog

AI

I Didn't Become a Developer to Review AI Slop

May 21, 2026

Written By Alice Moore

But lately, that's exactly what the job feels like.

My PR queue fills with work that, yes, technically compiles. The summary sounds plausible. It might even have some tests. Then when I open the diff, the real work starts.

What was the change supposed to do? Did anyone actually run the flow? Why is this helper duplicated six times? Is this actually fixing a bug, or did the AI just run around in circles and call it done?

AI made it effortless for anyone on my team (and yours) to create code, but it didn't make that code trustworthy.

Stack Overflow's 2025 Developer Survey found the most common frustration with AI tools is output that's "almost right, but not quite." Sonar's 2026 State of Code report found that 96% of developers don't fully trust AI-generated code, and 38% say reviewing it takes more effort than reviewing human-written code.

That's because AI code looks fine, but you have to really dig in to see what it's doing well. Straight up bad code is much easier to reject.

I'm annoyed. Maybe you are, too. Let's dig into this and solve it together.

AI agents can spin up branches from Jira tickets, patches from Slack threads, or even full PRs from a bug report before anyone even agrees that the bug is real. It's honestly a pretty awesome world.

But the thing is, developers aren't the only ones using these tools. PMs will prototype the feature they've been trying to explain for three sprints, mostly with vague, unhelpful hand waves. Designers will tweak UX flows and fix layouts that keep getting deprioritized. Marketers will update landing pages and forms. (Constantly.) Support will patch the customer pain points they know best.

And all that is a win. Small fixes shouldn't sit in backlog hell waiting for an engineer who happens to know that part of the code. Product knowledge should be turning into working software faster.

But the easier it gets to open a PR, the more developers are obligated to review them. And PRs aren't valuable just because they exist. They're only valuable when they can be trusted.

AI is really good at writing code. For a recent hackathon, I had GPT 5.5 spin up 10,000 lines of working code in about 45 minutes. The app mostly worked. Sure, the UI was a nightmare, but the core functionality was there.

But writing code and writing trustworthy, scaleable code are two different things. A model can generate a diff, explain it, and even run some happy-path tests. But someone's still accountable to the stuff that actually matters:

  • Did this code actually fix the stated problem?
  • Did the author really understand the system, or is this creating tech debt for later?
  • Is the diff bigger than it needs to be? (Almost definitely.)
  • Does this fix silently break some other flow in the code, that would be obvious if a single user just tried it out?
  • Does the UI actually work for real users in real browsers?
  • Will this fix survive past a demo?
  • Is this actually a fix to the root problem, or just a bandaid?
  • Is this security tradeoff acceptable?

These aren't syntax questions. They're trust questions. And right now, they all land on you and me, the developers. @richiemcilroy put it well in a viral tweeted video the other day:

The numbers tell the same story. LinearB's 2026 benchmarks found AI PRs sit waiting 4.6x longer for review and get rejected way more than human-written ones. METR's study of experienced open-source developers found early-2025 AI tools actually made devs 19% slower, partly because real work includes style, tests, docs, and review—not just typing.

That's not saying AI is useless. And the tools really do keep getting better everyday. But the real work of software was never just typing code into files. It's knowing what should change, what shouldn't, and when a surface-level patch that technically fixes the problem is actually going to haunt your team for the next six months.

That's where your attention needs to go. You should be weighing the stuff that needs taste and context, not manually rediscovering the basics after the PR is already in your queue.

Even though AI tools are making everyone more productive, being the bottleneck feels terrible. Everyone else gets to accomplish more than they've ever done before, because suddenly code is open to them.

You as a developer just experience the hype as incoming review debt. You aren't building. You're reviewing. You aren't designing the system; you're policing its edges. You aren't solving the hard problem directly; you're reverse-engineering what an agent or teammate was trying to do, then betting your afternoon on whether the diff is safe to keep.

The AI gets to do the fun part. You get to be a robot.

That doesn't mean you're useless. If anything, your judgment matters way more now.

A bell curve chart representing the Dunning-Kruger effect. A person at the low IQ end and a person at the high IQ end both say, "I hate reviewing code." A person in the middle, at the peak of the bell curve, cries and says, "AI IS GONNA TAKE ALL OUR JOBS!

But the workflow is spending your judgment terribly. It's taking the scarcest resource in the system—experienced engineering attention—and aiming it at mystery diffs, bloated patches, missing context, and generated code that only looks correct.

So yeah, it's boring. Yeah, it's frustrating. When someone says "now everyone can ship code," what you and I hear is "now everyone can create work for us."

Thus, the burnout.

So, what do we do? Well, the obvious reaction would be to lock up the repo. Devs only.

And I get that. You're the one who gets paged at 2am when prod goes down. Being protective of the code isn't elitism. You just have a memory.

But limiting access solves the wrong problem.

Cross-functional PRs aren't automatically bad. In fact, in many ways, they're exactly what we've wanted for years: product knowledge turning into small fixes without waiting on an engineer's calendar.

But the problem is that, even though everyone can now open PRs, PR intake itself hasn't evolved. Teams still treat a PR like a dev-to-dev handoff: here's the diff, here's the description, good luck. That worked great when the author was another engineer with the same local context, the same testing habits, and the same gut sense of what reviewers needed.

But that assumption falls apart now. Not because non-devs are careless. In fact, designers, PMs, marketers, support teams—they all have the best user context since they're closer to the problem. But they probably don't know what you need as a dev to evaluate risk. And when AI generated the actual implementation, even the person opening the PR might not know the full scope of what changed.

Mystery diffs aren't a reasonable way to collaborate. So, how do you change the way you work with PRs?

No dev should open a generated or cross-functional PR and have to reverse-engineer it from scratch. Every PR needs to show up with receipts:

  • Clear intent.
  • A small, scoped diff.
  • A summary of meaningful changes.
  • Relevant tests and results.
  • Browser-based QA on the affected flow.
  • Screenshots, replay, or other behavioral proof.
  • Console and network logs when something is failing.
  • Known risks, skipped cases, and open questions.
  • A path to fix issues on the same branch.

But that’s the problem. We say we want PMs, designers, marketers, and support to directly contribute, but then we expect them to act like senior engineers before we'll even review it.

A PM shouldn't need to know how to scope a tight diff. A designer shouldn't read network traces. A marketer shouldn't be QA. Support shouldn't write a perfect test plan just to propose a fix.

The entry bar needs to stay low. The review bar needs to go up.

Those aren't in conflict if there's an interpretation layer to bridge the gap. We already have amazing AI, so why aren't we using it, per PR, to review the quality and interpret intent before engineers waste their time?

The contributor can bring the product context: what hurts, why it matters, what good looks like. And they can be the ones who work with an AI agent to send the PR in the first place. Then, a review toolchain should translate that implementation into something a dev can trust.

The toolchain should keep diffs scoped, summarize real changes, run checks, open the product in a browser, click the flow, capture screenshots, surface console errors, and flag what it didn't test. It should let the contributor fix issues on the same branch without turning them into a release engineer.

And it should spare the developer from being the first person to discover the button doesn't work.

Everyone's starting to wake up to this problem. And PR review automation seems to be the best answer. That said, I've found that a lot of the existing PR review tools are pretty surface-level in what they do, mostly just acting as another AI agent to see if the code makes sense in context.

What you actually want is an agent that runs the code in the browser and tests real edge cases to spot failure modes. You can definitely piece it together yourself with enough CI glue. Or, you can get it off the shelf.

That's the point of our (Builder's) Quality Review Agent. It opens your app in real browsers, walks the affected flow, and returns evidence of what it clicked, what happened, and what failed, complete with replay links, console errors, network traces, and specific findings tied back to the change.

So now, instead of reviewing hundreds of PRs that start as mystery diffs, you get a product-specific review packet:

  • The affected flow, replay, and screenshots.
  • The console and network signals from the run.
  • The specific failures tied back to the change.
  • The risks, skipped cases, and remaining judgment calls.

After all, the goal isn't to remove developers from review. We still need to be there to raise the quality of the code in ways only we know how. But the goal is to stop burning developer attention on prep work that machines can handle without complaint.

Look. AI made it dead simple for anyone to ship code. What it didn't do was magically make that code trustworthy. And that means devs are feeling the burden the most right now, having to review all that slop.

Locking everyone out of the repo isn't the answer. We just need every PR to show up with enough context and proof that we can actually use our brains for judgment instead of wasting afternoons playing detective.

With today's agentic tools, that's a trust layer you can either try to assemble yourself, or you can get it from another company. Our take on it is the Builder QR Agent.

Regardless, it might be best to prioritize that pain before you turn into your company's human merge queue.

Get the latest from Builder

By submitting, you agree to our Privacy Policy.