To be clear, I didn't ask it to write something complex. The prompt was "how do ...

dartos · 2024-10-30T01:03:06 1730250186

> if you want a specific answer about something

Specific is the specific thing that statistical models are not good at :(

> how do I do X with library Y?

Recent research and anecdotal experience has shown that LLMs perform quite poorly with short prompts. Attention just has more data to work with when there are more tokens. Try extending that question like “I am using this programming language and am trying to do this task with this library. How do I do this thing with this other library”

I realize prompt engineering like this is fuzzy and “magic,” but short prompts have a consistent lower performance.

> In the specific case of programming, this could be improved with a simple engineering task: integrate the output with a real programming environment, and evaluate the result of actually running the code.

Not as simple as you’d think. You’re letting something run arbitrary code.

Tho you should give aider.chat a try if you want to test out that workflow. I found it very very slow.

imiric · 2024-10-30T07:10:09 1730272209

> Recent research and anecdotal experience has shown that LLMs perform quite poorly with short prompts.

I'm aware of that. The actual prompt was more elaborate. I was just mentioning the gist of it here.

Besides, you would think that after 30 minutes of prompting and corrections it would arrive at the correct answer. I'm aware that subsequent output is based on the session history, but I would also expect this to be less of an issue if the human response was negative. It just seems like sloppy engineering otherwise.

> Specific is the specific thing that statistical models are not good at

Some models are good at needle-in-a-haystack problems. If the information exists, they're able to find it. What I don't need is for it to hallucinate wrong answers if the information doesn't exist.

This is a core problem of this tech, but I also expected it to improve over time.

> Tho you should give aider.chat a try

Thanks, I'll do that eventually. If it's slow, it can get faster. I'd rather the tool be slow but give correct answers, than it slowing me down by wasting my time error correcting it.

Thankfully, these approaches can work for programming tasks. There is not much that can be done to verify the output of any other subject.