Back to blog

Two Months to Production (and the Bits AI Couldn't Rush)

Everything worked perfectly. Then we had to talk to the real world.

Two Months to Production (and the Bits AI Couldn't Rush)

Everything worked perfectly. Then we had to talk to the real world.

I love building with AI. The daily dopamine hits are genuinely addictive. Think, plan, prompt, and there it is. Triple cherries. When it works, it's electric. When the machine spits out three random vegetables instead, it's equally deflating. But you pull the lever again, because the next one might land.

A team of three: a dev, a visionary, and an e-commerce veteran who knows the catalogue inside out. Together we recently built a multi-channel e-commerce system. Multiple storefronts, shared inventory, centralised order management. Complex, but well-understood territory. Think Shopify, without the SaaS tax on every transaction. Which, in an era where AI will help you build pretty much anything, is starting to feel like a choice rather than a necessity. V1 was in production in under two months.

Two months. Headline candy for sure, clickbait maybe. But it's the truth. It's also six-day weeks, twelve-plus hour days, more than I can count. I lost track of how many times I hit the five-hour Claude usage limit three times in a single day. Nothing comes for free. Hard work is still very much required. AI doesn't remove the graft. It changes what you're grafting on.

For most of the build, AI was phenomenal. I had agent automation handling code reviews, catching the classic AI-generated code smells: type-checking gaps, duplicated logic, anti-patterns that look clean but scale badly. Separate agents for security reviews, SQL query optimisation, connection pooling checks. Tests, and an agent reviewing the test coverage itself. Layers of automated quality control, each one shaped by my own preferences for what "good" looks like.

It worked. Really well. The core platform came together faster than any comparable project I've built, and the code quality was genuinely high. Not "high for AI-generated code." Just high.

Then we got to supplier and warehouse integrations.

Where the Wheels Came Off

If you've ever integrated with third-party logistics or supplier systems, you already know what's coming.

Undocumented APIs. APIs documented in 2016 that bear only a passing resemblance to what the endpoint actually returns today. Integrations that rely on FTP and very specific text file formats. APIs that return integers as strings, or booleans as "Y" and "N", or errors as 200 responses with a message buried three levels deep in the payload.

This is where AI stopped being the fastest developer in the room and started being the most confidently wrong one.

Not because it couldn't write the code. It could. It wrote integration code quickly, cleanly, and incorrectly. It made assumptions about response formats that seemed reasonable but weren't. It handled the happy path beautifully and missed the seventeen ways a warehouse API can fail without technically returning an error.

Every time I pasted an error message in or it read one from the terminal, it would fix it. Immediately. Confidently. And then hit the next one. And fix that. And hit another. What it never did was stop.

The Coffee Problem

Here's what I've come to think of as the coffee problem. At some point during a complex integration, a human developer puts the laptop down, makes a coffee, does a lap of the garden, and thinks. Not about the current error. About the whole picture. About how all these moving pieces come together to form a single, reliable process. About the downstream gotchas that haven't surfaced yet but will.

AI doesn't do this. It's endlessly reactive. It'll respond to every symptom you show it, but it won't step back and ask whether it's treating the disease. It won't wonder whether the approach itself is wrong.

These messy, undocumented, real-world problems reward patience, lateral thinking, and the kind of understanding that only comes from spending time with a system before writing a line of code. So that's what I did. Slowly, deliberately, with a lot of coffee.

Was It Perfect at Launch?

No. Of course not. Nothing ever is.

We had the usual teething issues you get with any system this size in its first weeks of real traffic. A misconfigured environment variable. A cron expression an hour out. A webhook pointed at staging that should have been pointed at production. The classics.

Here's the thing though: every single one was human error. Not one of them came from the AI-generated code. That part had been reviewed by more agents, with more rigour, than any codebase I've shipped before. The bugs were in the glue, the config, the bits a human typed in a hurry at 1am with blurry vision.

Which is, honestly, quite funny. The robots did their job. The humans fumbled the handover.

What I Actually Learned

The e-commerce platform shipped. Thousands of orders processed successfully in the first few days. The supplier feeds run reliably, the warehouse sync holds up, orders flow through without manual intervention. It's solid.

But the timeline wasn't "AI fast" across the board. The core platform was. The integrations took time. Not the usual time, still AI-enhanced time, but certainly not three-cherries fast. Understanding messy systems and building reliable processes around them isn't a speed problem. It's a thinking problem.

AI got me to the starting line faster on every part of this project. For the clean, well-defined work, it carried me most of the way to the finish line too. For the messy, real-world, patience-required work, it got me to the starting line and then I needed to actually be an engineer.

The best code I wrote on this project was the code I wrote slowly. And the best thing AI did was handle everything else so I had the time to do that.

I'm Steve. I help businesses build and modernise software, with AI baked into every part of the process. I write here about what that actually involves.

🍪 This site uses tracking technologies. You may opt in or opt out of the use of these technologies.