Works well for us nonetheless, also on more complex things. It's not worse than most (including seniors) humans I worked with in the past 40 years, but it is faster and cheaper. On HN it is sometimes forgotten that by far most programmers do not like it; they need money. If you see what comes out of them, you have to puke; yet it's running billion$ businesses and works surprisingly well considering the bad code quality.
It's quite literally incapable of solving many very mid-level things, no matter how much you help it. It's not a reasoning machine, it's basically a different way to search existing answers.
Even in small steps, it fails. I have two cases I test with, nothing special, just some TS generics in one instance and a schema-to-schema mapping tool in another. Both things that Junior devs could do given a couple days, even though they'd need to study and figure out various pieces.
o1 can't get either, no matter how much I break it down, no matter how much prodding. In fact the more you try the worse it gets. And yes I do try starting new conversations and splitting it out. Simply does not help, at all.
It's not to say it isn't really helpful for really simple things. Or even complex things but that are directly in the training set. But the second you go outside that, it's terrible.
For a bash script or the first steps of something simple it’s great.
For anything complex at all it’s worse than nothing.