Yet when you look beyond boilerplate code generation, it's not all that LLMs increase experienced developers productivity (even when they believe that it does): https://arxiv.org/abs/2507.09089
Edit: Hello downvoters, would love to know if you found any flawed argument, is this just because this study/comment contradicting the common narrative on HN or something else entirely?
Is there literally anything other than this single, 16-participant study with that validates the idea that leveraging AI as an assistant reduces completion time in general?
Unless those participants were just complete idiots, I simply cannot square this with my last few weeks absolutely barnstorming on a project using Claude Code.
I wish we did a more formal study, but at $previous_job we rolled out AI tools (in that case it was github copilot) and we found that for 6-8 months productivity largely stayed the same or reduced slightly, but after that it sharply increased. This was rolled out to hundreds of developers with training, guidance, support etc. It was done in what I would consider the right way.
Was that project fairly early-days? The current impression seems to be that AI is useful for accelerating the development of smaller and simpler projects, but slows things down in large complex codebases.
The sample size isn’t the individual participants, it’s the hundreds of tasks performed as part of the study. There’s no indication the study was conducted incorrectly.
Except that the participants were thrown into tasks cold, seemingly without even the most basic prep one would/should do before throwing AI at a legacy codebase (sometimes called "LLM grounding" or "LLM context bootstrapping"). If the participants started without something like this, the study was either conducted incorrectly or was designed to support a certain conclusion.
By the time all of this is written, I'm familiar enough with the code to fly over it (Hello, Emacs and Vim). But by then, your tasks are small and targeted fixes, because any new feature requires lot of planning and stakeholder discussions that you can't just go and work on it.
Not sure why you quoted that part, it just says that there is no assumption for the results to be extrapolated to any codebase or any developer, setting the boundaries of the study objectives.
> it's not all that LLMs increase experienced developers productivity (even when they believe that it does):
In the present, I am struggling to parse this. When I made my original comment, I understood you to be saying that LLMs do not increase productivity. Synthesizing what you're saying now with that, if I had read
> it's not at all clear that LLMs increase
then I would have understood you correctly. That's my bad!
> The only extrapolations I've seen on this thread are people shrugging it as using 6 months old LLMs so this whole paper must be invalid today.
I feel for both sides on this one. I do think that, for me personally, the models they used weren't good, but the ones that exist now are. So I do think there's some issue there. However, I don't think that makes the study invalid, if anything, it's a great way to test this hypothesis: if they do the same thing again, but with newer models, that would lend some weight to that idea. So I also think saying that this is completely irrelevant is missing the point.
One of the problems with this study is that the field is moving so very fast.
6 months in models is an eternity. Anthropic has better models out since this study was done. Gemini keeps getting better. Grok / xAI isn’t a joke anymore. To say nothing of the massive open source advancements released in just the last couple weeks alone.
This is all moving so fast that one already out of date report isn’t definitive. Certainly an interesting snapshot in time, but has to be understood in context.
Hackernews needs to get better on this. The head in the sand vibe here won’t be tenable for much longer.
Since you asked, I downvoted you for asking about why you're being downvoted. Don't waste brain cells on fake internet points - it's bad for your health.
Edit: Hello downvoters, would love to know if you found any flawed argument, is this just because this study/comment contradicting the common narrative on HN or something else entirely?