Or, hey, maybe we've just had different experiences, and are using these tools d...

fragmede · 2024-11-18T01:18:00 1731892680

So link chats where you've run into the very real limitations these things have. What language you're using, what framework you're in, what library it hallucinated. I'm not interested in either of us shouting past each other, I genuinely want to understand how your experience, which is not at all lesser than mine, is so different. Am I ignoring flaws that you otherwise can't overlook? Are you expecting too much from it with too little input? Without details, all we can do is describe feelings at each other and get frustrated when the other person's experience is different. Might as well ask your star sign while we're at it.

imiric · 2024-11-19T11:53:24 1732017204

I use OpenRouter, which saves chats in local storage, and my browser is configured to delete all history and data on exit. So, unfortunately, I can't link you to an exact session.

I give more details of one instance of this behavior using Claude 3.5 Sonnet a few weeks ago here[1]. I was asking it to implement a specific feature using a popular Go CLI library. I could probably reproduce it, but honestly can't be bothered, nor do I wish to use more of my API credits for this.

Besides, why should I have to prove anything in this discussion? We're arguing based on good faith, and just as I assume your experience is based on positive interactions, so should you assume mine is based on negative ones.

But I'll give you one last argument based on principles alone.

LLMs are trained on mountains of data from various online sources (web sites, blogs, documentation, GitHub, SO, etc.). This training takes many months and has a cutoff point sometime in the past. When you ask them to generate some code using a specific library, how can you be sure that the code is using the specific version of the library you're currently using? How can you be sure that the library is even in the training set and that the LLM won't just hallucinate it entirely?

Some LLMs allow you to add sufficient context to your prompts (with RAG, etc.) to increase the likelihood of generating working code, which can help, but still isn't foolproof, and not all services/tools allow this.

But more crucially, when you ask it to do something that the library doesn't support, the LLM will never tell you "this isn't possible" or "I don't know". It will instead proceed to hallucinate a solution because that's what it was trained to do.

And how are these state-of-the-art coding LLMs that pass all these coding challenges capable of producing errors like referencing an undefined variable? Surely these trivial bugs shouldn't be possible, no?

All of these issues were what caused me to waste more than an hour fighting with both Claude 3.5 Sonnet and GPT-4o. And keep in mind that this was a fairly small problem. This is why I can't imagine how building an entire app, using a framework and dozens of libraries, could possibly be more productive than doing it without them. But clearly this doesn't seem to be an opinion shared by most people here, so let's agree to disagree.

[1]: https://news.ycombinator.com/item?id=41987474

senorrib · 2024-11-18T00:35:05 1731890105

I wasn’t targeting this specifically at you or your individual experience. However, I did hear the same arguments you make ad nauseam, and they usually come from people that are either just too skeptical, or don’t put the effort required to use the tool.