Well, there are two possible interpretations here of 75% of participants (all of...

atiedebee · 2025-07-10T20:30:00 1752179400

Let me bring you a third (not necessarily true) interpretation:

The developer who has experience using cursor saw a productivity increase not because he became better at using cursor, but because he became worse at not using it.

card_zero · 2025-07-10T20:41:19 1752180079

Or, one person in 16 has a particular personality, inclined to LLM dependence.

cutemonster · 2025-07-10T22:15:27 1752185727

Didn't they rather mean:

Developers' own skills might atrophy, when they don't write that much code themselves, relying on AI instead.

And now when comparing with/without AI they're faster with. But a year ago they might have been that fast or faster without an AI.

I'm not saying that that's how things are. Just pointing out another way to interpret what GP said

runarberg · 2025-07-10T21:19:16 1752182356

Invoking personality is to the behavioral science as invoking God is to the natural sciences. One can explain anything by appealing to personality, and as such it explains nothing. Psychologists have been trying to make sense of personality for over a century without much success (the best efforts so far have been a five factor model [Big 5] which has ultimately pretty minor predictive value), which is why most behavioral scientists have learned to simply leave personality to the philosophers and concentrate on much simpler theoretical framework.

A much simpler explanation is what your parent offered. And to many behavioralists it is actually the same explanation, as to a true scotsm... [cough] behavioralist personality is simply learned habits, so—by Occam’s razor—you should omit personality from your model.

suddenlybananas · 2025-07-11T05:57:06 1752213426

Behaviorism is a relic of the 1950s

runarberg · 2025-07-11T18:50:05 1752259805

Not really a relic. Reinforcement learning is one of the best model for learned behavior we have. In the 1950s however cognitive science didn’t exist, and behavioralists thought they could explain much more with their model than they could, so they oversold the idea, by a lot.

Cognitive science was able to explain stuff like biases, pattern recognition, language, etc. which behavioral science thought they could explain, but couldn’t. In the 1950s it was really the only game in town (except for psychometrics which failed in a way much more complete—albeit less spectacular—way then behaviorism), so understandably scientists (and philosophers) went a little overboard with it (kind of like evolutionary biology did in the 1920s).

I think a more fair viewpoint is to claim that behaviorism’s heyday in the 1950s has passed, but it still provides an excellent theoretical framework for some of human behavior, and along with cognitive science, is able to explain most of what we know about human behavior.

card_zero · 2025-07-10T21:33:28 1752183208

Fair comment, but I'm not down with behavioralism, and people have personalities, regrettably.

runarberg · 2025-07-10T21:59:12 1752184752

This is still ultimately a research within the field of the behavior sciences, and as such the laws of human behavior apply, where behaviorism offers a far more successful theoretical framework than personality psychology.

Nobody is denying that people have personalities btw. Not even true behavioralists do that, they simply argue from reductionism that personality can be explained with learning contingencies and the reinforcement history. Very few people are true behavioralists these days though, but within the behavior sciences, scientists are much more likely to borrow missing factors (i.e. things that learning contingencies fail to explain) from fields such as cognitive science (or even further to neuroscience) and (less often) social science.

What I am arguing here, however, is that the appeal to personality is unnecessary when explaining behavior.

As for figuring out what personality is, that is still within the realm of philosophy. Maybe cognitive science will do a better job at explaining it than psychometricians have done for the past century. I certainly hope so, it would be nice to have a better model of human behavior. But I think even if we could explain personality, it still wouldn’t help us here. At best we would be in a similar situation as physics, where one model can explain things traveling at the speed of light, while another model can explain things at the sub-atomic scale, but the two models cannot be applied together.

literalAardvark · 2025-07-11T10:06:16 1752228376

Became worse is possible

Became worse in 50 hours? Super unlikely

burnte · 2025-07-10T20:27:08 1752179228

> Current LLMs just are not as good as they are sold to be as a programming assistant and people consistently predict and self-report in the wrong direction on how useful they are.

I would argue you don't need the "as a programming assistant" phrase as right now from my experience over the past 2 years, literally every single AI tool is massively oversold as to its utility. I've literally not seen a single one that delivers on what it's billed as capable of.

They're useful, but right now they need a lot of handholding and I don't have time for that. Too much fact checking. If I want a tool I always have to double check, I was born with a memory so I'm already good there. I don't want to have to fact check my fact checker.

LLMs are great at small tasks. The larger the single task is, or the more tasks you try to cram into one session, the worse they fall apart.

steveklabnik · 2025-07-10T19:21:50 1752175310

> Current LLMs

One thing that happened here is that they aren't using current LLMs:

> Most issues were completed in February and March 2025, before models like Claude 4 Opus or Gemini 2.5 Pro were released.

That doesn't mean this study is bad! In fact, I'd be very curious to see it done again, but with newer models, to see if that has an impact.

blibble · 2025-07-10T19:47:23 1752176843

> One thing that happened here is that they aren't using current LLMs

I've been hearing this for 2 years now

the previous model retroactively becomes total dogshit the moment a new one is released

convenient, isn't it?

nalllar · 2025-07-10T20:14:43 1752178483

If you interact with internet comments and discussions as an amorphous blob of people you'll see a constant trickle of the view that models now are useful, and before were useless.

If you pay attention to who says it, you'll find that people have different personal thresholds for finding llms useful, not that any given person like steveklabnik above keeps flip-flopping on their view.

This is a variant on the goomba fallacy: https://englishinprogress.net/gen-z-slang/goomba-fallacy-exp...

steveklabnik · 2025-07-10T20:02:14 1752177734

Sorry, that’s not my take. I didn’t think these tools were useful until the latest set of models, that is, they crossed the threshold of usefulness to me.

Even then though, “technology gets better over time” shouldn’t be surprising, as it’s pretty common.

mattmanser · 2025-07-10T20:15:14 1752178514

Do you really see a massive jump?

For context, I've been using AI, a mix of OpenAi + Claude, mainly for bashing out quick React stuff. For over a year now. Anything else it's generally rubbish and slower than working without. Though I still use it to rubber duck, so I'm still seeing the level of quality for backend.

I'd say they're only marginally better today than they were even 2 years ago.

Every time a new model comes out you get a bunch of people raving how great the new one is and I honestly can't really tell the difference. The only real difference is reasoning models actually slowed everything down, but now I see its reasoning. It's only useful because I often spot it leaving out important stuff from the final answer.

simonw · 2025-07-10T21:34:35 1752183275

The massive jump in the last six months is that the new set of "reasoning" models got really good at reasoning about when to call tools, and were accompanied is by a flurry of tools-in-loop coding agents - Claude Code, OpenAI Codex, Cursor in Agent mode etc.

An LLM that can test the code it is writing and then iterate to fix the bugs turns out to be a huge step forward from LLMs that just write code without trying to then exercise it.

vidarh · 2025-07-10T22:52:44 1752187964

I've gone from asking the tools how to do things, and cut and pasting the bits (often small) that'd be helpful, via using assistants that I'd review every decision of and often having to start over, to now often starting an assistant with broad permissions and just reviewing the diff later, after they've made the changes pass the test suite, run a linter and fixed all the issues it brought up, and written a draft commit message.

The jump has been massive.

otabdeveloper4 · 2025-07-11T13:37:55 1752241075

> but now I see its reasoning

It's not showing its reasoning. "Reasoning" models are trained to output more tokens in the hope that more tokens means less hallucinations.

It's just a marketing trick and there is no evidence this sort of fake ""reasoning"" actually gives any benefit.

steveklabnik · 2025-07-10T20:27:45 1752179265

Yes. In January I would have told you AI tools are bullshit. Today I’m on the $200/month Claude Max plan.

As with anything, your miles may vary: I’m not here to tell anyone that thinks they still suck that their experience is invalid, but to me it’s been a pretty big swing.

Uehreka · 2025-07-10T20:46:43 1752180403

> In January I would have told you AI tools are bullshit. Today I’m on the $200/month Claude Max plan.

Same. For me the turning point was VS Code’s Copilot Agent mode in April. That changed everything about how I work, though it had a lot of drawbacks due to its glitches (many of these were fixed within 6 or so weeks).

When Claude Sonnet 4 came out in May, I could immediately tell it was a step-function increase in capability. It was the first time an AI, faced with ambiguous and complicated situations, would be willing to answer a question with a definitive and confident “No”.

After a few weeks, it became clear that VS Code’s interface and usage limits were becoming the bottleneck. I went to my boss, bullet points in hand, and easily got approval for the Claude Max $200 plan. Boom, another step-function increase.

We’re living in an incredibly exciting time to be a skilled developer. I understand the need to stay skeptical and measure the real benefits, but I feel like a lot of people are getting caught up in the culture war aspect and are missing out on something truly wonderful.

mattmanser · 2025-07-10T21:54:57 1752184497

Ok, I'll have to try it out then. I've got a side project I've 3/4 finished and will let it loose on it.

So are you using Claude Code via the max plan, Cursor, or what?

I think I'd definitely hit AI news exhaustion and was viewing people raving about this agentic stuff as yet more AI fanbois. I'd just continued using the AI separate as setting up a new IDE seemed like too much work for the fractional gains I'd been seeing.

steveklabnik · 2025-07-10T22:02:42 1752184962

I had a bad time with Cursor. I use Claude Code inside of VS: Code. You don't necessarily need Max, but you can spend a lot of money very quickly on API tokens, so I'd recommend to anyone trying, start with the $20/month one, no need to spend a ton of money just to try something out.

There is a skill gap, like, I think of it like vim: at first it slows you down, but then as you learn it, you end up speeding up. So you may also find that it doesn't really vibe with the way you work, even if I am having a good time with it. I know people who are great engineers who still don't like this stuff, just like I know ones that do too.

mh- · 2025-07-11T01:02:58 1752195778

Worth noting for the folks asking: there's an official Claude Code extension for VS Code now [0]. I haven't tried it personally, but that's mostly because I mainly use the terminal and vim.

[0]: https://marketplace.visualstudio.com/items?itemName=anthropi...

steveklabnik · 2025-07-11T02:49:27 1752202167

Yes, it’s not necessary but it is convenient for viewing diffs in Code’s diff view. The terminal is a fine way to interact with it though.

jpc0 · 2025-07-11T21:13:10 1752268390

Takes this with a massive grain of salt but my experience with Google Code CLI recently, we pay for google products but not others internally, I can’t change that decision.

I asked it two implement two bicubic filters, a high pass filter and a high shelf filter. Some context, using the gemini webapp it would split out the exact code I need with the interfaces I require one shot because this is truly trivial C++ code to write.

15 million tokens and an hour and a half later I now had a project that could not build, the filters were not implemented and my trust in AI agentic workflows broken.

It cost me nothing, I just reset the repo and I was watching youtube videos for that hour and a half.

Your mileage may vary and I’m very sure if this was golang or typescript it might have done significantly better, but even compared to the exact same model in a chat interface my experience was horrible.

I’m sticking to the slightly “worse” experience of using the chat interface which does give me significant improvements in productivity vs letting the agent burn money and time and not produce working code.

8note · 2025-07-11T06:10:46 1752214246

id say thats not gonna be the best use for it, unless what you really want is to first document in detail everything about it.

im using claude + vscode's cline extension for the most part, but where it tends to excel is helping you write documentation, and then using that documentation to write reasonable code.

if you're 3/4 of the way done, a lot of the docs of what it wants to work well are gonna be missing, and so a lot of your intentions about why you did or didnt make certain choices will be missing. if you've got good docs, make sure to feed those in as context.

the agentic tool on its own is still kinda meh, if you only try to write code directly from it. definitely better than the non-agentic stuff, but if you start with trying to get it to document stuff, and ask you questions about what it should know in order to make the change its pretty good.

even if you dont get perfect code, or it spins in a feedback loop where its lost the plot, those questions it asks can be super handy in terms of code patterns that you havent thought about that apply to your code, and things that would usually be undefined behaviour.

my raving is that i get to leave behind useful docs in my code packages, and my team members get access to and use those docs, without the usual discoverability problems, and i get those docs for... somewhat slower than i could have written the code myself, but much much faster than if i also had to write those docs

hombre_fatal · 2025-07-10T20:19:43 1752178783

I see a massive jump every time.

Just two years ago, this failed.

> Me: What language is this: "esto está escrito en inglés"

> LLM: English

Gemini and Opus have solved questions that took me weeks to solve myself. And I'll feed some complex code into each new iteration and it will catch a race condition I missed even with testing and line by line scrutiny.

Consider how many more years of experience you need as a software engineer to catch hard race conditions just from reading code than someone who couldn't do it after trying 100 times. We take it for granted already since we see it as "it caught it or it didn't", but these are massive jumps in capability.

ipaddr · 2025-07-10T20:18:24 1752178704

Wait until the next set. You will find you the previous ones weren't useful after all.

steveklabnik · 2025-07-10T20:29:55 1752179395

This makes no sense to me. I’m well aware that I’m getting value today, that’s not going to change in the future: it’s already happened.

Sure they may get even more useful in the future but that doesn’t change my present.

bix6 · 2025-07-10T22:07:04 1752185224

Everything actually got better. Look at the image generation improvements as an easily visible benchmark.

I do not program for my day job and I vibe coded two different web projects. One in twenty mins as a test with cloudflare deployment having never used cloudflare and one in a week over vacation (and then fixed a deep safari bug two weeks later by hammering the LLM). These tools massively raise the capabilities for sub-average people like me and decrease the time / brain requirements significantly.

I had to make a little update to reset the KV store on cloudflare and the LLM did it in 20s after failing the syntax twice. I would’ve spent at least a few minutes looking it up otherwise.

mwigdahl · 2025-07-10T23:06:29 1752188789

I've been a proponent for a long time, so I certainly fit this at least partially. However, the combination of Claude Code and the Claude 4 models has pushed the response to my demos of AI coding at my org from "hey, that's kind of cool" to "Wow, can you get me an API key please?"

It's been a very noticeable uptick in power, and although there have been some nice increases with past model releases, this has been both the largest and the one that has unlocked the most real value since I've been following the tech.

achierius · 2025-07-10T23:12:34 1752189154

Is that really the case vs. 3.7? For me that was the threshold, and since then the improvements have been nice but not as significant.

mwigdahl · 2025-07-10T23:49:05 1752191345

I would agree with you that the jump from Sonnet 3.7 to Sonnet 4 feels notable but not shocking. Opus 4 is considerably better, and Opus 4 combined with the Claude Code harness is what really unlocks the value for me.

cfst · 2025-07-10T20:14:14 1752178454

The current batch of models, specifically Claude Sonnet and Opus 4, are the first I've used that have actually been more helpful than annoying on the large mixed-language codebases I work in. I suspect that dividing line differs greatly between developers and applications.

Aeolun · 2025-07-10T22:46:10 1752187570

It’s true though? Previous models could do well in specifically created settings. You can throw practically everything at Opus, and it’ll work mostly fine.

simonw · 2025-07-10T19:49:51 1752176991

The previous model retroactively becomes not as good as the best available models. I don't think that's a huge surprise.

cwillu · 2025-07-10T19:58:50 1752177530

The surprise is the implication that the crossover between net-negative and net-positive impact happened to be in the last 4 months, in light of the initial release 2 years ago and sufficient public attention for a study to be funded and completed.

Yes, it might make a difference, but it is a little tiresome that there's always a “this is based on a model that is x months old!” comment, because it will always be true: an academic study does not get funded, executed, written up, and published in less time.

Ntrails · 2025-07-10T20:17:29 1752178649

Some of it is just that (probably different) people said the same damn things 6 months ago.

"No, the 2.8 release is the first good one. It massively improves workflows"

Then, 6 months later, the study comes out.

"Ah man, 2.8 was useless, 3.0 really crossed the threshold on value add"

At some point, you roll your eyes and assume it is just snake oil sales

steveklabnik · 2025-07-10T20:40:41 1752180041

There’s a lot of confounding factors here. For example, you could point to any of these things in the last ~8 months as being significant changes:

* the release of agentic workflow tools

* the release of MCPs

* the release of new models, Claude 4 and Gemini 2.5 in particular

* subagents

* asynchronous agents

All or any of these could have made for a big or small impact. For example, I’m big on agentic tools, skeptical of MCPs, and don’t think we yet understand subagents. That’s different from those who, for example, think MCPs are the future.

> At some point, you roll your eyes and assume it is just snake oil sales

No, you have to realize you’re talking to a population of people, and not necessarily the same person. Opinions are going to vary, they’re not literally the same person each time.

There are surely snake oil salesman, but you can’t buy anything from me.

Ntrails · 2025-07-11T16:29:23 1752251363

> you have to realize you’re talking to a population of people, and not necessarily the same person. Opinions are going to vary, they’re not literally the same person each time.

I pointed this out in my post for a reason. I get it. But even given a different person is saying the same thing every time a new release comes out - the effect on my prior is the same.

Filligree · 2025-07-10T20:40:00 1752180000

Or you accept that different people have different skill levels, workflows and goals, and therefore the AIs reach usability at different times.

rsynnott · 2025-07-11T10:25:22 1752229522

The complication is that, as noted in the above paper, _people are bad at self-reporting on whether the magic robot works for them_. Just because someone _believes_ they are more effective using LLMs is not particularly strong evidence that they actually are.

foobarqux · 2025-07-10T20:24:25 1752179065

That's not the argument being made though, which is that it does "work" now and implying that actually it didn't quite work before; except that that is the same thing the same people say for every model release, including at the time or release of the previous one, which is now acknowledged to be seriously flawed; and including the future one, at which time the current models will similarly be acknowledged to be, not only less performant that the future models, but inherently flawed.

Of course it's possible that at some point you get to a model that really works, irrespective of the history of false claims from the zealots, but it does mean you should take their comments with a grain of salt.

steveklabnik · 2025-07-10T20:43:54 1752180234

> That's not the argument being made though, which is that it does "work" now and implying that actually it didn't quite work before

Right.

> except that that is the same thing the same people say for every model release,

I did not say that, no.

I am sure you can find someone who is in a Groundhog Day about this, but it’s just simpler than that: as tools improve, more people find them useful than before. You’re not talking to the same people, you are talking to new people each time who now have had their threshold crossed.

blibble · 2025-07-10T21:04:11 1752181451

> You’re not talking to the same people, you are talking to new people each time who now have had their threshold crossed.

no, it's the same names, again and again

simonw · 2025-07-10T21:36:37 1752183397

Got receipts?

That sounds like a claim you could back up with a little bit of time spent using Hacker News search or similar.

(I might try to get a tool like o3 to run those searches for me.)

blibble · 2025-07-10T21:52:12 1752184332

try asking it what sealioning is

maxbond · 2025-07-11T01:38:01 1752197881

You've no obligation to answer, no one is entitled to your time, but it's a reasonable request. It's not sealioning to respectfully ask for directly relevant evidence that takes about 10-15m to get.

pdabbadabba · 2025-07-10T20:02:08 1752177728

Maybe it's convenient. But isn't it also just a fact that some of the models available today are better than the ones available five months ago?

bryanrasmussen · 2025-07-10T20:11:43 1752178303

sure, but after having spent some time trying to get anything useful - programmatically - out of previous models and not getting anything once a new one is announced how much time should one spend.

Sure you may end up missing out on a good thing and then having to come late to the party, but coming early to the party too many times and the beer is watered down and the food has grubs is apt to make you cynical the next time a party announcement comes your way.

Terr_ · 2025-07-10T20:39:35 1752179975

Plus it's not even possible to miss the metaphorical party: If it gets going, it will be quite obvious long before it peaks.

(Unless one believes the most grandiose prophecies of a technological-singularity apocalypse, that is.)

Terr_ · 2025-07-10T20:18:01 1752178681

That's not the issue. Their complaint is that proponents keep revising what ought to be fixed goalposts... Well, fixed unless you believe unassisted human developers are also getting dramatically better at their jobs every year.

Like the boy who cried wolf, it'll eventually be true with enough time... But we should stop giving them the benefit of the doubt.

_____

Jan 2025: "Ignore last month's models, they aren't good enough to show a marked increase in human productivity, test with this month's models and the benefits are obvious."

Feb 2025: "Ignore last month's models, they aren't good enough to show a marked increase in human productivity, test with this month's models and the benefits are obvious."

Mar 2025: "Ignore last month's models, they aren't good enough to show a marked increase in human productivity, test with this month's models and the benefits are obvious."

Apr 2025: [Ad nauseam, you get the idea]

pdabbadabba · 2025-07-10T21:00:41 1752181241

Fair enough. For what it's worth, I've always thought that the more reasonable claim is that AI tools make poor-average developers more productive, not necessarily expert developers.

bluefirebrand · 2025-07-10T23:01:31 1752188491

Personally I don't want poor-average developers to be more productive, I want them to be more expert

pdabbadabba · 2025-07-11T01:48:27 1752198507

Sure. But what would you suppose the ratio is between expert, average, and mediocre coders in the average organization? I think a small minority would be in the first category, and I don’t see a technology on the horizon that will change that except for LLMs, which seem like they could make mediocre coders both more productive and produce higher quality output.

bluefirebrand · 2025-07-11T05:14:47 1752210887

They definitely aren't producing higher quality output imo, but definitely producing low quality output faster

That's not a tradeoff that I like

pdabbadabba · 2025-07-11T14:54:38 1752245678

That's the study I'm really interested in: does AI use improve the output of lower-skill developers (not experts). My intuitions point me in the opposite direction. I think AI would improve their work. But I'm not aware of any hard data that would help answer this question.

Terr_ · 2025-07-11T01:36:02 1752197762

"Compared to last quarter, we've shipped 40% more spaghetti-code!"

itsoktocry · 2025-07-11T18:07:45 1752257265

>the previous model retroactively becomes total dogshit the moment a new one is released

Keep writing your code manually, nobody cares.

player1234 · 2025-07-12T19:56:04 1752350164

And nobody will notice.

jstummbillig · 2025-07-10T20:10:27 1752178227

Convenient for whom and what...? There is nothing tangible to gain from you believing or not believing that someone else does (or does not) get a productivity boost from AI. This is not a religion and it's not crypto. The AI users' net worth is not tied to another ones use of or stance on AI (if anything, it's the opposite).

More generally, the phenomenon this is quite simply explained and nothing surprising: New things improve, quickly. That does not mean that something is good or valuable but it's how new tech gets introduced every single time, and readily explains changing sentiment.

leshow · 2025-07-10T21:30:50 1752183050

I think you're missing the broader context. There is a lot of people very invested in the maximalist outcome which does create pressure for people to be boosters. You don't need a digital token for that to happen. There's a social media aspect as well that creates a feedback loop about claims.

We're in a hype cycle, and it means we should be extra critical when evaluating the tech so we don't get taken in by exaggerated claims.

jstummbillig · 2025-07-10T22:20:44 1752186044

I mostly don't agree. Yes, there is always social pressure with these things, and we are in a hype cycle, but the people "buying in" are simply not doing much at all. They are mostly consumers, waiting for the next model, which they have no control over or stake in creating (by and large).

The people not buying into the hype, on the other hands, are actually the ones that have a very good reason to be invested, because if they turn out to be wrong they might face some very uncomfortable adjustments in the job landscape and a lot of the skills that they worked so hard to gain and believed to be valuable.

As always, be weary of any claims, but the tension here is very much the reverse of crypto and I don't think that's very appreciated.

card_zero · 2025-07-10T20:26:20 1752179180

I saw that edit. Indeed you can't predict that rejecting a new thing is part of a routine of being wrong. It's true that "it's strange and new, therefore I hate it" is a very human (and adorable) instinct, but sometimes it's reasonable.

saturneria · 2025-07-11T11:16:21 1752232581

It is an even more human reaction when the new strange thing directly threatens to upend and massively change the industry that puts food on your table.

The steam-powered loom was not good for the luddites either. Good for society at large in the long term but all the negative points that a 40 year old knitter in 1810 could make against the steam-powered loom would have been perfectly reasonable and accurate judged on that individual's perspective.

jstummbillig · 2025-07-10T21:00:48 1752181248

"I saw that edit" lol

card_zero · 2025-07-10T21:05:54 1752181554

Sorry, just happened to. Slightly rude of me.

jstummbillig · 2025-07-10T21:16:58 1752182218

Ah, you do you. It's just a fairly kindergarten thing to point out and not something I was actively trying to hide. Whatever it was.

Generally, I do a couple of edits for clarity after posting and reading again. Sometimes that involves removing something that I feel could have been said better. If it does not work, I will just delete the comment. Whatever it was must not have been a super huge deal (to me).

maxbond · 2025-07-11T02:38:36 1752201516

FYI there's a "delay" setting in your profile that allows you to make your comment invisible for up to ten minutes.

grey-area · 2025-07-10T20:28:24 1752179304

Honestly the hype cycle feels very like crypto, and just like crypto prominent vcs have a lot of money riding on the outcome.

jstummbillig · 2025-07-10T21:08:03 1752181683

Of course, lot's of hype, but my point is that the reason why is very different and it matters: As an early bc adopter making your believe in bc is super important to my net worth (and you not believing in bc makes me look like an idiot and lose a lot of money).

In contrast, what do I care if you believe in code generation AI? If you do, you are probably driving up pricing. I mean, I am sure that there are people that care very much, but there is little inherent value for me in you doing so, as long as the people who are building the AI are making enough profit to keep it running.

With regards to the VCs, well, how many VCs are there in the world? How many of the people who have something good to say about AI are likely VCs? I might be off by an order of magnitude, but even then it would really not be driving the discussion.

leshow · 2025-07-10T21:34:16 1752183256

I don't find that a compelling argument, lots of people get taken in by hype cycles even when they don't profit directly from it.

steveklabnik · 2025-07-10T20:32:02 1752179522

I agree with you, and I think that’s coloring a lot of people’s perceptions. I am not a crypto fan but am an LLM fan.

Every hype cycle feels like this, and some of them are nonsense and some of them are real. We’ll see.

giantg2 · 2025-07-10T23:51:09 1752191469

The third option is that the person who used Cursor before had some sort of skill atrophy that led to lower unassisted speed.

I think an easy measure to help identify why a slow down is happening would be to measure how much refactoring happened on the AI generated code. Often times it seems to be missing stuff like error handling, or adds in unnecessary stuff. Of course this assumes it even had a working solution in the first place.

Terr_ · 2025-07-10T19:18:45 1752175125

> people consistently predict and self-report in the wrong direction

I recall an adage about work-estimation: As chunks get too big, people unconsciously substitute "how possible does the final outcome feel" with "how long will the work take to do."

People asked "how long did it take" could be substituting something else, such as "how alone did I feel while working on it."

sandinmyjoints · 2025-07-10T19:31:11 1752175871

That’s an interesting adage. Any ideas of its source?

Dilettante_ · 2025-07-10T19:45:06 1752176706

It might have been in Kahneman's "Thinking, Fast and Slow"

Terr_ · 2025-07-10T19:54:22 1752177262

I'm not sure, but something involving Kahneman et al. seems very plausible: The relevant term is probably "Attribute Substitution."

https://en.wikipedia.org/wiki/Attribute_substitution

robwwilliams · 2025-07-10T20:38:08 1752179888

Or a sampling artifact. 4 vs 12 does seem significant within a study, but consider a set of N such studies.

I assume that many large companies have tested efficiency gains and losses of there programmers much more extensively than the authors of this tiny study.

A survey of companies and their evaluation and conclusions would carry more weight—-excluding companies selling AI products, of course.

rs186 · 2025-07-10T22:25:37 1752186337

If you use binomial test, P(X<=4) is about 0.105 which means p = 0.21.