With your analogy I would be the one saying that I'm still not convinced that sk...

WhatIsDukkha · 2024-11-17T22:22:34 1731882154

I use -

Rust, aider.chat and

I thoughtfully limit the context of what I'm coding (on 2 of 15 files).

I ./ask a few times to get the context setup. I let it speculate on the path ahead but rein it in with more conservative goals.

I then say "let's carefully and conservatively implement this" (this is really important with sonnet as its way too eager).

I get to compile by doing ./test a few times, there is sometimes a doom loop though so -

I reset the context with a better footing if things are going off track or I just think "its time".

I do not commit until I have a plausible building set of functions (it can probably handle touching 2-3 functions of configs or one complete function but don't get too much more elaborate without care and experience).

I either reset or use the remaining context to create some tests and validate.

I think saying 1.3x more productive is fair with only this loop BUT you have to keep a few things in perspective.

I wrote specs for everything I did, in other words I wrote out in english my goals and expectations of the code, that was highly valuable and something I probably wouldn't have done.

Automatic literate programming!

Sheep shearing is crazy fast with an LLM. Those tasks that would take you off in the weeds do feel 5x faster (with caveats).

I think the 2x-5x faster is true within certain bounds -

What are the things that you were psychologically avoiding /dragging or just skipping because they were too tedious to even think of?

Some people don't have that problem or maybe don't notice, to me its a real crazy benefit I love!

That's were the real speedups happens and its amazing.

max6zx · 2024-11-18T05:22:46 1731907366

Do you mind sharing how much experience you have with the tech stack that have generated code? What I found with LLM is the perspective for AI generated code is different depends on your own experience, and I would like to know whether it is only my experience.

I have more than 20 years with backend development and just some limited experience with frontend tech stacks. I tried using LLM initially with for frontend in my personal project. I found that code generation by LLM are so good. It produces code that works immediately with my vague prompts. It happily fixes any issue that I found pretty quick and correct. I also have enough knowledge to tweak anything that I need so at the end of the day, I can see that my project work as expected. I feel really productive with it.

Then I slowly start using LLM for my backend projects at work. And I was so suprise that the experience was completely opposite. Both ChatGPT and Claude generated code that either bad practice or have flaw, or just ignore instructions in my prompt to come back to bad solutions after just a few questions. It also fails to apply common practices from architecture perspectives. So the effort to make it work is much more than when I do all coding myself.

At that point, I thought probably there are more frontend projects used to train those models than in backend projects, therefore quality of code in frontend tech is much better. But when using LLM with another language that I did not have much experience for another backend project, I found out why my experience is so much different as I can now observe more clearly on what is bad and good in the generated code.

In my previous backend project, as I have much more knowledge on languages/frameworks/practice, my criteria was also higher. It is not just the code that can run, it must be extensible, in right structure and in good architecture, use correct idiom ... Whereas my frontend experience is more limited, the generated code work as I expected but possibly it also violated all these NFRs that I do not know. It explains why using it with a new program language (something I don't have much experience) in a backend project (my well know domain) I found a mixed experience when it seems to provide me working code, but failed on following good practices.

My hypothesis is LLM can generate code at intemediate level, so if your experience is limited you see it as pure gold. But if your level is much better, those generated code are just garbage. I really want to hear more from other people to validate my hypothesis as it seems people also have opposite experiences with this.

Kiro · 2024-11-18T20:10:17 1731960617

> Am I that bad with my prompting skills?

Or you're using skis on gravel. I'm a firm believer that the utility varies greatly depending on the tech stack and what you're trying to do, ranging from negative value to way more than 5x.

I also think "prompting" is a misrepresentation of where the actual skill and experiences matter. It's about being efficient with the tooling. Prompting, waiting for a response and then manually copypasting line by line into multiple places is something else entirely than having two LLMs work in tandem, with one figuring out the solution and the other applying the diff.

Good tooling also means that there's no overhead trying out multiple solutions. It should be so frictionless that you sometimes redo a working solution just because you want to see a different approach.

Finally, you must be really active and can't just passively wait for the LLM to finish before you start analyzing the output. Terminate early, reprompt and retry. The first 5 seconds after submitting is crucial and being able to take a decision just from seeing a few lines of code is a completely new skill for me.