I’ve been back at work after surgery for just over two months (though it feels much longer), and I noticed something that made me pause.
I don’t think I’ve written a single line of code myself.
It’s Claude via kiro-cli now. No token limits. AWS runs the meter, I run the prompts.
Sometimes it’s great. When you already know what needs to be done and what “good” looks like, you can get to a decent implementation in a couple iterations.
But lately I’ve had this nagging feeling that I’m not building software anymore.
I’m operating it.
Prompting is easy. Ownership isn’t.
On my team, the workflow is basically: describe the change, let the model produce the diff, iterate until tests pass (and ideally you add a couple), ship.
That works surprisingly well for straightforward stuff.
It works less well for the things that matter long-term: naming, edge cases, the “what happens at 2am when this breaks” paths, the parts where ownership lives.
I still do what I’ve always done with my own PRs. I read them before I send them out. I try to break my own change. I ask myself if future-me would hate current-me.
A lot of people skip that.
And you can often tell. The PR reads like first-draft model output with a human signature at the bottom.
“You care too much”
Recently I left a few comments on a teammate’s PR.
Not “this is wrong,” more “this works, but we can make it clearer, safer, less weird.” The sort of feedback that keeps a codebase from turning into a junk drawer.
His response was basically: “You care too much.”
Then he suggested I should just use AI to review the AI-generated code.
That part got under my skin.
If one person writes with AI and another reviews with AI, do we even need reviews? Or are we just comparing prompts?
I did end up asking Claude to review it and forwarded my teammate the double-digit list of bullet points it produced. He promised to fix them. He did.
So it works, in the narrow sense.
What I don’t understand is what it does to ownership.
The PR becomes a blob
If nobody can explain why the code is shaped the way it is, then the PR becomes a blob that passed CI.
Reviews turn into “clean up the model output,” not “two engineers aligning on intent.”
And then there is the mentoring problem.
Some of my teammates are new to software development, so part of the job is teaching them what to look for: what a good abstraction is, why naming matters, how to spot edge cases, how to make code readable for the next person.
Except now so much of the first draft comes from a model that it’s easy to skip the learning part entirely. You can ship fast and still build zero taste.
I don’t have a clean answer for this. I’m still trying to figure out what the right “human in the loop” ritual looks like.
A rule I’m tempted to introduce
I’m tempted to introduce a simple rule: before a PR goes out, the author should be able to explain the change in their own words.
Not polished English. Just a short explanation of what changed, why it changed, and what they checked.
Will people use AI to write that too? Sure.
But there’s still a big gap between a tidy summary and actual understanding. You can usually tell within a minute.
If someone can answer a few basic questions without going back to the model, I trust the PR more, even if AI wrote most of the lines:
- what breaks if this fails
- what happens on bad input
- what you didn’t solve on purpose
- how you’d debug it if it paged
If they can’t, then it doesn’t matter how “clean” the diff looks.
Someone will probably argue that understanding matters less now because the AI can always explain the code later. Maybe. But from my experience getting paged in the middle of the night a couple weeks ago, the opposite is true. If you don’t understand the system well enough to know where to look, the AI just wastes more of your time. You end up prompting in circles because you can’t tell which details matter.
Talking to people through models
This shows up in communication too.
I work on a team with different nationalities, and some people don’t speak English well. More than once I’ve had to say, “I don’t know what that means.”
I think they got the memo, because now some of them use AI to communicate now.
It still feels wrong though when I’m effectively talking to a model through another person. If I only ever interact with the AI output, it’s hard not to wonder what the human contribution is supposed to be.
Maybe this is why so many people say the easy part is writing code now. The hard part is knowing what to build, and how to get it adopted.
The shift
I work on internal tooling at Amazon. Distribution is simpler than it is for anyone selling a product. We can enable things for every builder instead of marketing to strangers.
The “what should we build” part is still hard.
So maybe that’s the shift. Every engineer becomes more engaged, more vocal, more idea-driven. People who can spot leverage and pull it.
I remember people using the term code-monkey before. I don’t think it exists in this new world. If you still see yourself as “just” a code monkey, you’re already behind.
And yes, a model can help generate ideas too.
And this is what makes it hard to complain about: AI is genuinely useful.
The mutation testing moment
I was reading this paper from Meta on mutation testing using LLMs.
Instead of asking if we should try it, I spent a couple hours using AI to add it as an MCP tool locally. Then I showed it during one of our demo meetings.
People were excited. My manager scheduled a meeting for me to show it to some PEs and other managers. They were excited too.
And I wasn’t. At least not after I used it for real.
Because once I started poking at it, I found bugs in my integration. Not theoretical issues. Real broken behavior.
For example, the model would think it killed mutants just because the test it wrote failed on mutated code, without verifying that the same test actually passes on the original code.
Claude helped me patch these issues with a lot of back and forth, but the reason it got fixed is simple: a human cared enough to test it, break it, and keep going until it was solid.
That’s the moment the “operator” feeling came back.
Back to QA
I started in a call-center. Then tech ops. Then QA. Then developer.
And lately I’ve felt like I’m back doing QA again, except the thing I’m testing is the model’s output.
That loop is messing with my motivation more than I expected.
Before, I could at least get a sense of satisfaction after spending three days fixing a bug. It felt like winning. Like I finally beat the boss in the game. It felt like building.
Now I often feel like a supervisor. Approving. Verifying. Nudging.
Useful, yes.
Fun, not really.
So what’s the point?
Reviews don’t go away
I don’t think reviews go away. If anything, they become more important, because the code shows up faster than the understanding does.
What changes is what a review is for.
A review can’t just be “does this compile and pass tests.” It has to be “can someone on this team actually understand this change well enough to own it.”
Because when the code is cheap, the expensive part becomes intent, taste, and accountability.
AI makes it easy to ship a first draft. It also makes it easy to avoid learning the lessons that used to come bundled with writing the first draft yourself.
I don’t want a team full of prompt operators.
I don’t want to be one either.
If your team is using AI heavily, what’s your rule for PR ownership? What do you require from the human every time?
P.S. I’m not against using AI. I’m against “I pasted it and now it’s your problem.”
P.P.S. Written by me. Edited by AI. Reviewed by me. That’s the part that matters.
Cheers!
Evgeny Urubkov (@codevev)