I have actually been surprised at how few subtle bugs like this actually come up when using tools like Claude Code. Usually the bugs it introduces are glaringly obvious, and stem from a misunderstanding of the prompt, not due to the code being poorly thought out.
This has been a surprise to me, as I expected code review of AI-generated code to be much more difficult than it has been in practice. Maybe this has been because I only really use LLMs to write code that is easy to explain, and therefore probably not that complicated. If code is more complicated, then I will write it myself.
That's what the code review is for :-) But to echo the sibling comments, I've not caught a subtle edge case or bug in the generated code in more than a year and half. There are mistakes and failure modes for sure, but they are very glaring, to the extent that I simply throw that code away and try again.
That said, I've adopted a specific way of working with AI that is very effective for my situation (mentioned in my comment history, but echoes a lot of what TFA advises.)
Getting pretty tired of this narrative. It's a very GPT-4 2023 era take that LLM's are just introducing untold amounts of bugs. People need to seriously learn their AI tooling and stop repeating this nonsense.
At the most generous I will allow, it's an illusion where producing 20-50x the amount of code/hour introduces a higher raw count of bugs relative to what you are used to in that timeframe - but this notion of AI coders being more bugprone than humans is utter nonsense. The only way thats not true is on very niche systems or when the human has conceptualized and planned out their code extensively beforehand - in which case AI would still be the superior next step.
> the only way thats not true is on very niche systems
Are these very niche? Yeah, there is a category of coding where what you are doing is essentially translating from English to some high-level language with high-level APIs. This is not significantly different than translating to Spanish, of course LLMs will be successful here.
But there are endless other domains with complex reasoning where LLMs absolutely suck. Like please tell me how will an LLM reason about concurrent access. And prompting it so that it will reply with "Oh you are right, here is Atomic blahblah" is not reasoning, it's statistical nonsense.
Don't get me wrong, I do think LLMs are a very useful tool, but it is as much overhyped by some as it is underhyped by others.
And creates new ones you wouldn't even consider before, creating just as much, if not more future debugging :D