Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So you spend 10 minutes writing a free text description of the test you want; tell it exactly how you want it to write the test, and then 4-5 minutes trying to understand if it did the right thing or not, restart because it did something crazy then a few minutes manually fixing the diff it generated?

MMmm.

I mean, don't get me wrong; this is impressive stuff; but it needs to be an order of magnitude less 'screwing around trying to fix the random crap' for this to be 'wow, amazing!' rather than a technical demonstration.

You could have done this more quickly without using AI.

I have no doubt this is transformative technology, but people using it are choosing to use it; it's not actually better than not using it at this point, as far as I can tell.

It's slower and more error prone.



Stoked you watched, thanks. (Sorry the example isn't the greatest/lacks context. The first video was better, but the mic gain was too high.)

You summed up the workflow accurately. Except, I read your first paragraph in a positive light, while I imagine you meant it to be negative.

Note the feedback loop you described is the same one as me delegating requirements to someone else (i.e. s/LLM/jr eng). And then reading/editing their PR. Except the feedback loop is, obviously, much tighter.

I've written a lot of tests, I think this would have taken 3-4x longer to do by hand. Surely an hour?

But even if all things were roughly equal, I like being in the navigator seat vs the driver seat. Editor vs writer. It helps me keep the big picture in mind, focused on requirements and architecture, not line-wise implementation details.


> I've written a lot of tests, I think this would have taken 3-4x longer to do by hand. Surely an hour?

It seems to me that the first few are almost complete copy-paste of older tests. You would have got code closer to the final test in the update case with simple copy-paste than what was provided.

The real value is only in the filtered test to choose randomly (btw, I have no idea why that’s beneficial here), and the one which checks that both consumers got the same info. They can be done in a few minutes with the help of the already made insert test, and the original version of the filtered test.

I’m happy that more people can code with this, and it’s great that it makes your coding faster. It makes coding more accessible. However, there are a lot of people who can do this faster without AI, so it’s definitely not for everybody yet.


> I've written a lot of tests, I think this would have taken 3-4x longer to do by hand. Surely an hour?

I guess my point is I'm skeptical.

I don't believe what you had the end would have taken you that long to do by hand. I don't believe it would have taken an hour. It certainly would not have taken me or anyone on my team that long.

I feel like you're projecting that, if you scale this process, so say, having 5 LLMs running in parallel, then what you would get is you spending maybe 20% more time reviewing 5x PRs instead of 1x PR, but getting 5x as much stuff done in the end.

Which may be true.

...but, and this is really my point: It's not true, in this example. It's not true in any examples I've seen.

It feels like it might be true in the near-moderate future, but there are a lot of underlying assumptions that is based on:

- LLMs get faster (probably)

- LLMs get more accurate and less prone to errors (???)

- LLMs get more context size without going crazy (???)

- The marginal cost of doing N x code reviews is < the cost of just writing code N times (???)

These are assumptions that... well, who knows? Maybe? ...but right now? Like, today?

The problem is: If it was actually making people more productive then we would see evidence of it. Like, actual concrete examples of people having 10 LLMs building systems for them.

...but what we do see, is people doing things like this, which seem like (to me at least), either worse or on-par with just doing the same work by hand.

A different workflow, certainly; but not obviously better.

LLMs appear to have an immediate right now disruptive impact on particular domains, like, say, learning, where its extremely clear that having a wise coding assistant to help you gain simple cross domain knowledge is highly impactful (look at stack overflow); but despite all the hand waving and all the people talking about it, the actual concrete evidence of a 'Devin' that actually builds software or even meaningfully improves programmer productivity (not 'is a tool that gives some marginal benefit to existing autocomplete'; actually improves productivity) is ...

...simply absent.

I find that problematic, and it makes me skeptical of grand claims.

Grand claims require concrete tangible evidence.

I've no doubt that you've got a workflow that works for you, and thanks for sharing it. :) ...I just don't think its really compelling, currently, to work that way for most people; I don't think you can reasonably argue it's more productive, or more effective, based on what I've actually seen.


I used it for some troubleshooting in my job. Linux Sys Admin. I like how I can just ask it a question and explain the situation and it goes thru everything with me, like a Person.


How worried are you that it's giving you bad advice?

There are plenty of, "Just disable certificate checking" type answers on Stack Overflow, but there are also a lot of comments calling them out. How do you fact check the AI? Is it just a shortcut to finding better documentation?


In my opinion it’s better at filtering down my convoluted explanation into some troubleshooting steps I can take, to investigate. It’s kind of like an evolved Google algorithm, boiling down the internet’s knowledge. And I’ve had it give me step by step instructions on “esoteric” things like dwm config file examples, plugins for displaying pictures in terminal, what files to edit and where…it’s kind of efficient I think. Better than browsing ads. Lol.


> Certainly, yes. To initialize the IPv6 subsystem, enter 'init 6' in an elevated system terminal.


Whoops.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: