Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yet when you look beyond boilerplate code generation, it's not all that LLMs increase experienced developers productivity (even when they believe that it does): https://arxiv.org/abs/2507.09089

Edit: Hello downvoters, would love to know if you found any flawed argument, is this just because this study/comment contradicting the common narrative on HN or something else entirely?



Is there literally anything other than this single, 16-participant study with that validates the idea that leveraging AI as an assistant reduces completion time in general?

Unless those participants were just complete idiots, I simply cannot square this with my last few weeks absolutely barnstorming on a project using Claude Code.


I wish we did a more formal study, but at $previous_job we rolled out AI tools (in that case it was github copilot) and we found that for 6-8 months productivity largely stayed the same or reduced slightly, but after that it sharply increased. This was rolled out to hundreds of developers with training, guidance, support etc. It was done in what I would consider the right way.


Was that project fairly early-days? The current impression seems to be that AI is useful for accelerating the development of smaller and simpler projects, but slows things down in large complex codebases.


The sample size isn’t the individual participants, it’s the hundreds of tasks performed as part of the study. There’s no indication the study was conducted incorrectly.


Except that the participants were thrown into tasks cold, seemingly without even the most basic prep one would/should do before throwing AI at a legacy codebase (sometimes called "LLM grounding" or "LLM context bootstrapping"). If the participants started without something like this, the study was either conducted incorrectly or was designed to support a certain conclusion.

  LLMs.md
  ├── data_model.md
  ├── architecture.md
  ├── infrastructure.md
  ├── business_logic.md
  ├── known_issues.md
  └── conventions.md


By the time all of this is written, I'm familiar enough with the code to fly over it (Hello, Emacs and Vim). But by then, your tasks are small and targeted fixes, because any new feature requires lot of planning and stakeholder discussions that you can't just go and work on it.


> I simply cannot square this with my last few weeks absolutely barnstorming on a project using Claude Code.

I don't know, but the interesting data in the study is that they all said the same thing you are saying, but their actual time was 19% slower.

And yes, right now it's the only study that seemed to have a good methodology that I've seen that has any data positive or negative.


This study was about “246 tasks in mature projects”. I would expect AI to fare much better in a study about new projects or brainstorming.


From the paper:

> We do not provide evidence that:

> AI systems do not currently speed up many or most software developers

> We do not claim that our developers or repositories represent a majority or plurality of software development work


Not sure why you quoted that part, it just says that there is no assumption for the results to be extrapolated to any codebase or any developer, setting the boundaries of the study objectives.


You have made the claim that it does extrapolate. Which they themselves do not make.


How is saying that it's not all clear LLM increase experienced developers productivity an extrapolation?

The only extrapolations I've seen on this thread are people shrugging it as using 6 months old LLMs so this whole paper must be invalid today.


Okay, so, I re-read your original post:

> it's not all that LLMs increase experienced developers productivity (even when they believe that it does):

In the present, I am struggling to parse this. When I made my original comment, I understood you to be saying that LLMs do not increase productivity. Synthesizing what you're saying now with that, if I had read

> it's not at all clear that LLMs increase

then I would have understood you correctly. That's my bad!

> The only extrapolations I've seen on this thread are people shrugging it as using 6 months old LLMs so this whole paper must be invalid today.

I feel for both sides on this one. I do think that, for me personally, the models they used weren't good, but the ones that exist now are. So I do think there's some issue there. However, I don't think that makes the study invalid, if anything, it's a great way to test this hypothesis: if they do the same thing again, but with newer models, that would lend some weight to that idea. So I also think saying that this is completely irrelevant is missing the point.


That study is going to go down as the red herring it is. Shows little more than people with minimal experience using LLMs for dev do it wrong.


One of the problems with this study is that the field is moving so very fast.

6 months in models is an eternity. Anthropic has better models out since this study was done. Gemini keeps getting better. Grok / xAI isn’t a joke anymore. To say nothing of the massive open source advancements released in just the last couple weeks alone.

This is all moving so fast that one already out of date report isn’t definitive. Certainly an interesting snapshot in time, but has to be understood in context.

Hackernews needs to get better on this. The head in the sand vibe here won’t be tenable for much longer.


> Hello downvoters... is this just because...

Since you asked, I downvoted you for asking about why you're being downvoted. Don't waste brain cells on fake internet points - it's bad for your health.


Here here




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: