PracticalBy Chelsea HulinJune 13, 20267 min read

How to Tell If an AI Output Is Good Enough to Ship

The three-pass review I run on every AI output, and the rule for when good enough beats perfect-but-I-did-it-myself.

Most operators I talk to are not stuck because the AI output is bad. They’re stuck because they can’t tell whether it’s good.

A draft lands in front of them. It reads fine. It might be wrong. They don’t have a fast way to know, so they do the safe thing: they rewrite it themselves, badly, slowly, at 11 p.m. And then they tell me AI didn’t save them any time.

The bottleneck was never the model. It was the absence of a review pass they could run in two minutes without second-guessing.

I spent fifteen years as a nurse. I never wrote a med order from scratch. I checked one against the five rights before it reached a patient. That check was the skill. Same job here. The model writes; you verify. So let me give you the verification.

The real skill is quality control, not prompting

Everyone is optimizing the wrong half. There are a thousand prompt guides and almost nothing on how to judge what comes back.

But prompting has a ceiling. You can write the perfect prompt and still get a confident, fluent, completely wrong paragraph. That’s the nature of these tools. They are designed to sound right, which means “sounds right” is worthless as a quality signal.

The model’s job is to sound right. Your job is to find out whether it is.

So the operators who actually get leverage from AI are not the ones with the best prompts. They’re the ones who can look at an output and decide, fast, whether to ship it, fix it, or kill it. That’s a learnable, repeatable skill. Here’s the version I run.

The three-pass review I actually use

Every AI output I ship goes through three passes, in order. Factual, then voice, then liability. The order matters: there’s no point polishing the voice of something that’s wrong.

Pass 1: factual. Is anything in here false, invented, or unverifiable. Names, numbers, dates, claims, links. I assume every specific is wrong until I’ve confirmed it. Most outputs survive this with one or two corrections. Some die here, which is the pass doing its job.

Pass 2: voice. Does this sound like me, or like a model wearing my name. I’m hunting for the tells: “in today’s fast-paced world,” “leverage,” “it’s important to note,” the throat-clearing intro, the bullet list that should have been a sentence. If I’d be embarrassed to have written it, it fails.

Pass 3: liability. What happens if this is wrong and it goes out anyway. A typo in a Slack message and a typo in a client contract are not the same risk, and they don’t get the same scrutiny. This pass decides how hard the first two passes had to be.

The whole thing takes two to four minutes for a normal output. That’s the number that matters. If your review takes longer than writing it yourself would have, you’ve built a worse process, not a better one.

When “good enough” beats “perfect but I did it myself”

Here’s the rule that unfreezes people. Most operators hold AI output to a standard they never held their own work to.

The honest comparison is never “AI output vs. perfect.” It’s “AI output vs. what you’d actually produce, at the speed you’d actually produce it.” A first draft you’d grind out tired on a Friday is not perfect either. If the reviewed AI version is as good as your realistic output and took a fraction of the time, ship it.

The trap is the inverse: the rare output where the cost of being wrong is high. There, “good enough” is a liability. The skill is knowing which one you’re holding. This is the whole decision.

Cost if it’s wrong →

Verify hard, then ship

Client deliverables
Anything with your name on it
Pass 1 and 3 are non-negotiable

Slow down, do it yourself

Contracts, medical, legal, financial
AI drafts, you own every word

Ship it now

Internal notes, first drafts
Light Pass 1, skip the rest

Why are you using AI here

Low stakes, no time pressure
Either trivial or not worth automating

How fast you need it →

Most of your work lives in the top-left and bottom-left. The bottom-left is where you stop fighting the tool and ship the reviewed draft. The top-left is where the three passes earn their keep. The bottom-right quadrant is the tell: if the output is low-stakes and you’re not in a hurry, you’re using AI out of habit, not leverage.

Pro tip

Before you review anything, say the stakes out loud: “if this is wrong, the worst case is ___.” That one sentence tells you which quadrant you’re in and how hard to look.

What this looks like in practice

A client of mine, a three-person insurance agency, was using Claude to draft policy-renewal emails and hating it. Not because the drafts were bad. Because she reread every word three times, terrified of a wrong number reaching a customer.

We split it. The renewal date and premium amount get a hard Pass 1, every time, checked against the system of record. The rest of the email, the greeting, the framing, the call to book a review, gets a quick voice skim and goes. Her review time dropped from roughly twelve minutes an email to under three. She stopped rewriting and started shipping. The tool didn’t change. The review did.

That’s the pattern every time. The operators winning with AI aren’t writing better prompts. They’ve just stopped treating every output as either flawless or worthless, and built a fast way to tell the difference.

Do this with your next AI output

You don’t need a system. You need three passes and an honest comparison. Try it on the next thing the model hands you.

Your three-pass review, this week

Before reading, name the worst case if it’s wrong
Pass 1: check every name, number, date, and claim
Pass 2: hunt the tells; cut anything that sounds like a model
Pass 3: match the scrutiny to the stakes, not your anxiety
Compare to your real output, not to perfect
If it’s as good and faster, ship it. Stop rewriting.

I’ll be honest about where this can break. The three-pass review assumes you can actually tell a true claim from a false one in your domain. In a field you don’t know, you can’t, and AI will out-confidence you every time. There the only safe pass is a human expert, not a faster checklist. But for the work you already understand, the thing standing between you and real leverage isn’t the model’s quality. It’s that nobody taught you how to judge it. Now you have a way.

Free · 7-Day Action Plan

Find your highest-impact AI opportunity.

Take the AI Readiness Audit. Get a clear, practical 7-day plan you can run on Monday.

Take the audit →