I did exactly that and found it lackluster for the domain I asked it for.
And most use I've seen on it realistically a good LSP covers.
Or to put it a other way. It's no good at writing algorithms or data structures ( or at least no better thab I would have with a first drafy but the first draft puts me ahead of the LLM in understanding that actual problem at hand, handing it off to an LLM doesn't help me get to the final solution faster).
So that leaves writing boiler plate but concidering my experience with it writing more complex stuff, I would need to read over the boilerplate code to ensure it's correct which in that case I may as well have written it.
> found it lackluster for the domain I asked it for
Fair, that is possible depending on your domain.
> It's no good at writing algorithms or data structures
In my experience, this is untrue. I’ve gotten it to write algorithms with various constraints I had. You can even tell it to use specific function signatures instead of any stdlib, and make changes to tweak behavior.
> And most use I've seen on it realistically a good LSP covers.
Again, I really don’t understand this comparison. LSPs and LLMs go hand in hand.
I think it’s more of a workflow clash. One really needs to change how they operate to effectively use LLMs for programming. If you’re just typing nonstop, maybe it would feel like Copilot is just an LSP. But, if you try harder, LLMs are game changers when:
- maybe you like rubber ducking
- need to learn a new concept and implement it
- or need to glue things together
- or for new projects or features
- or filling in boilerplate based on existing context.
I mean I could write the algorithm by hand pretty quickly in C++ and would follow the exact same thought pattern but also deal with the edge cases. And factoring in the loss of productivity from the context switch that is a net negative. This algorithm is also not generic over enough cases but that is just up to the prompt.
If I can't trust it to write `strip_whitespace` correctly which is like 5 lines of code, can I trust it to do more without a thorough review of the code and writing a ton of unit tests... Well I was going to do that anyway.
The argument that I just need to learn better prompt engineering to make the LLM do what I want just doesn't sit with me when instead I could just spend the time writing the code. As I said your last point is absolutely the place I can see LLMs being actually useful but then I need to spend a significant amount of time in code review for generated code from an "employee" who is known to make up interfaces or entire libraries that doesn't exist.
I'm a Python-slinging data scientist so C++ isn't my jam (to say the least), but I changed the prompt to the following and asked it to GPT-4:
> Write me an algorithm in C++ which finds the begin and end iterator of a sequence where leading and trailing whitespace is stripped. Please write secure code that handles any possible edge cases.
I'm not sure what other edge cases there might be, however. This only covers one of them.
In general, I've found LLMs to be marginally helpful. Like, I can't ever remember how to get matplotlib to give me the plot I want, and 9 times out of 10 GPT-4 easily gives me the code I want. Anything even slightly off the beaten path, though, and it quickly becomes absolutely useless.
Sure GPT4 is better at that, it wasn't the argument made.
The example you gave absolutely was the code I would write on a first draft since it does cover the edge cases (assuming we aren't dealing with the full UTF charset and all that could be considered a space there).
However this is code that is trivial to write in any language and the "Is it that hard to input a prompt into the free version of ChatGPT and see how it helps with programming? " argument doesn't hold up. Am I to believe it will implement something more complex correctly. This is also code that would absolutely be in hundreds of codebases so GPT has tons of context for it.
I think you have the mistaken impression that I was arguing with you (certainly my comment makes it clear that I don't feel that LLMs are a panacea). I merely thought that you might be curious how GPT-4 would respond.
> My guess is that this was generated using GPT4?
This is a good guess, since I stated outright that I used GPT-4, and then mentioned GPT-4 later on in the comment.
Yeah honestly, I think you have a completely different expectation and style of usage than what is optimal with LLMs. I don’t have the energy to convince you further, but maybe one day it’ll click for you? No worries either way.
Like sibling commenter mentioned, simonw’s blog is a great resource.
Regarding your point around being able to whip up the code yourself - the point is to have a decent starting point to save time and energy. Like you said, you know the edge cases so you could skip the boring parts using GPT and focus purely on fixing those. Though, with more prompting (especially providing examples), GPT can also handle that for you.
I have nearly 2 decades of experience as a developer and it took me a while to reorient my flow around LLMs. But now that I have, it’s truly gamechanging.
And since you asked, here’s my system prompt:
You are an experienced developer who follows industry standards and best practices. Write lean code and explain briefly using bullet points or numbered lists. Elaborate only when explaining concepts or making choices. Always mention which file and where to store provided code.
Tech Stack: < insert all the languages, frameworks, etc you’d like to use >
If I provide code, highlight and explain problematic code. Also show and explain the corrected code.
Take a deep breath and think step by step.
Also, always use GPT4 and customize the above to your style and liking.
I will definitely try this out when I have time later in the day.
There is some code I would really prefer not to write that is a decent test case for this and won't expose company code to GPT. Will give feedback when I am done. Maybe you are correct.
If you really want to experiment, give Cursor a try. It’s free up to a limit, so maybe it’ll be enough for your example use case.
It handles even more complex use cases and will automatically include/patch code for you via the inbuilt LLM framework. This helps with iteration and modifications as you massage it to what you need. Plus, it’ll scan your code and find the languages/frameworks automatically.
Finally, keep in mind that the goal should not be perfect production code - that’s just Twitter AI hype. It’s about saving time and energy for you (the human) to achieve more than possible before.
I tried your prompt and the above approach and it took me about 45 minutes of putsing around to get a result I am happy to begin iteration on.
Effectively: I have an 80bit byte array representing a timestamp struct consisting of a 48 bit unsigned integer for seconds and a 32 bit unsigned integer representing nanoseconds. The byte array is big endian and the host systen is little endian.
I gave it full signatures for all functions and relevant structs and instructions on how I would want the parsing done regarding algorithmic complexity and yet it still took multiple iterations to get anything useful.
At this point it is converting to little endian during the decode then doing a check if host the system is big endian and converting back to big endian if that is true.
There is likely some easy optimisations to be done where there and I would have definitely have gotten to this point quicker had I just written the 10 lines of code this needed and would have done the optimisations where I'm pretty sure that entire operation can happen in a few instructions.
I'm not sure there isn't a buffer overflow in the vector_decode code he showed there, likewise I don't see any error checks on the code and I am not familiar with the sqlite api to even know whether errors can be propagated upwards and what error conditions would mean in that code.
This code is probably fine for a quick side project but doesn't pass my smell test for anything close to production ready code.
I definitely would want to see a lot of unit tests around the decode and encode functions with fuzzing and to be honestly that would be the bulk of the work here. That and documentation on this code. Even though he encode function looks correct at first glance.
I also don't see an easy way to actually unit test this code as it is without actually running it through sqlite which outs a lot of dependencies on the unit test.
I would either need to spend a lot more time massaging gpt to get this to a point where I would be fine shipping the code or you know just write it myself.
Is it that hard to input a prompt into the free version of ChatGPT and see how it helps with programming?