Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

would you rather the LLM make up something that sounds right when it doesn't know, or would you like it to claim "i don't know" for tasks it actually can figure out? because presumably both happen at some rate, and if it hallucinates an answer i can at least check what that answer is or accept it with a grain of salt.

nobody freaks out when humans make mistakes, but we assume our nascent AIs, being machines, should always function correctly all the time



> would you rather the LLM make up something that sounds right when it doesn't know, or would you like it to claim "i don't know" for tasks it actually can figure out?

The latter option every single time


> but we assume our nascent AIs, being machines, should always function correctly all the time

A tool that does not function is a defective tool. When I issue a command, it better does it correctly or it will be replaced.


And that's part of the problem - you're thinking of it like a hammer when it's not a hammer. It's asking someone at a bar a question. You'll often get an answer - but even if they respond confidently that doesn't make it correct. The problem is people assuming things are fact because "someone at a bar told them." That's not much better than, "it must be true I saw it on TV".

It's a different type of tool - a person has to treat it that way.


Asking a question is very contextual. I don't ask a lawyer house engineering problems, nor my doctor how to bake cake. That means If I'm asking someone at a bar, I'm already prepare to deal with the fact that the person is maybe drunk, probably won't know,... And more often than not, I won't even ask the question unless dire needs. Because it's the most inefficient way to get an informed answer.

I wouldn't bat an eye if people were taking code suggestions, then review it and edit it to make it correct. But from what I see, it's pretty a direct push to production if they got it to compile, which is different from correct.


Sounds like a trillion dollar industry.


It would be nice to have some kind of "confidence level" annotation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: