Unfortunately they are trained first and foremost as plausibility engines. The central dogma is that plausibility will (with continuing progress & scale) converge towards correctness, or "faithfulness" as it's sometimes called in the literature.
This remains very far from proven.
The null hypothesis that would be necessary to reject, therefore, is a most unfortunate one, viz. that by training for plausibility we are creating the world's most convincing bullshit machines.
> plausibility [would] converge towards correctness
That is the most horribly dangerous idea, as we demand that the agent guesses not, even - and especially - when the agent is a champion at guessing - we demand that the agent checks.
If G guesses from the multiplication table with remarkable success, we more strongly demand that G computes its output accurately instead.
Oracles that, out of extraordinary average accuracy, people may forget are not computers, are dangerous.
One man's "plausibility" is another person's "barely reasoned bullshit". I think you're being generous, because LLMs explicitly don't deal in facts, they deal in making stuff up that is vaguely reminiscent of fact. Only a few companies are even trying to make reasoning (as in axioms-cum-deductions, i.e., logic per se) a core part of the models, and they're really struggling to hand-engineer the topology and methodology necessary for that to work roughly as facsimile of technical reasoning.
I’m not really being generous. I merely think if I’m gonna condemn something as high-profile snake oil for the tragically gullible, it’s helpful to have a solid basis for doing so. And it’s also important to allow oneself to be wrong about something, however remote the possibility may currently seem, and preferably without having to revise one’s principles to recognise it.
As a sort of related anecdote... if you remember the days before google, people sitting around your dinner table arguing about stuff used to spew all sorts of bullshit then drop that they have a degree from XYZ university and they won the argument... when google/wikipedia came around turned out that those people were in fact just spewing bullshit. I'm sure there was some damage but it feels like a similar thing. Our "bullshit-radar" seems to be able to adapt to these sorts of things.
Well, with every conspiracy theories thriving on this day and age with access to technology and information at one fingertips. If you add that now the US administration effectively spewing bullshit every few minutes.
The best example of this was an arguement I had a little while ago where I was talking about self driving and I was mentioning that I have a hard time trusting any system relying only on cameras, to which I was being told that I didn't understand how machine learning works and obviously they were correct and I was wrong and every car would be self driving within 5 years. All of these things could easily be verified independently.
Suffice to say that I am not sure that the "bullshit-radar" is that adaptive...
Mind you, this is not limited to the particular issue at hand but I think those situations needs to be highlighted, because we get fooled easily by authoritative delivery...
Language models are closing the gaps that still remain at an amazing rate. There are still a few gaps, but if we consider what has happened just in the last year, and extrapolated 2-3 years out....
I think you are discounting the fact that you can weed out people who make a habit of that, but you can't do that with LLMs if they are all doing that.
Some people trust Alex Jones, while the vast majority realize that he just fabricates untruths constantly. Far fewer people realize that LLMs do the same.
People know that computers are deterministic, but most don't realize that determinism and accuracy are orthogonal. Most non-IT people give computers authoritative deference they do not deserve. This has been a huge issue with things like Shot Spotter, facial recognition, etc.
One thing I see a lot on X is people asking Grok what movie or show a scene is from.
LLMs must be really, really bad at this because not only is it never right, it actually just makes something up that doesn't exist. Every, single, time.
I really wish it would just say "I'm not good at this, so I do not know."
When your model of the world is build on the relative probabilities of the next opaque apparently-arbitrary number in context of prior opaque apparently-arbitrary numbers, it must be nearly impossible to tell the difference between “there are several plausible ways to proceed, many of which the user will find useful or informative, and I should pick one” and “I don’t know”. Attempting to adjust to allow for the latter probably tends to make the things output “I don’t know” all the time, even when the output they’d have otherwise produced would have been good.
I thought about this of course, and I think a reasonable 'hack' for now is to more or less hardcode things that your LLM sucks at, and override it to say it doesn't know. Because continually failing at basic tasks is bad for confidence in said product.
I mean, it basically does the same thing if you ask it to do anything racist or offensive, so that override ability is obviously there.
So if it identifies the request as identifying a movie scene, just say 'I don't know', for example.
Hardcode by whom? Who do we trust with this task to do it correctly? Another LLM that suffers from the same fundamental flaw or by a low paid digital worker in a developing country? Because that's the current solution. And who's gonna pay for all that once the dumb investment money runs out, who's gonna stick around after the hype?
By the LLM team (Grok team, in this case). I don't mean for the LLM to be sentient enough to know it doesn't know the answer, I mean for the LLM to identify what is being asked of it, and checking to see if that's something on the 'blacklist of actions I cannot do yet', said list maintained by humans, before replying.
No different than when asking ChatGPT to generate images or videos or whatever before it could, it would just tell you it was unable to.
> It's impossible to predict with certainty who will be the U.S. President in 2046. The political landscape can change significantly over time, and many factors, including elections, candidates, and events, will influence the outcome. The next U.S. presidential election will take place in 2028, so it would be difficult to know for sure who will hold office nearly two decades from now.
I can do this because it is in fact the most likely thing to continue with, word by word.
But the most likely thing to continue a paper with is not to say at the end „I don‘t know“. It is actually providing sources which it proceeds to do wrongly.
>> We need an AI technology that can output "don't know" when appropriate. How's that coming along?
Heh. Easiest answer in the world. To be able to say "don't know", one has first to be able to "know". And we ain't there yet, by large. Not even flying by a million miles of it.
Needs meta annotation of certainty on all nodes and tokkens that accumulates while reasoning . Also gives the ability to train in believes, as in overriding any uncertainty. Right now we are in the pure believes phase.AI is its own god right now, pure blissful believe without the sin of doubt.
Sure we have. We don't have a perfect solution but it's miles better than what we have for LLMs.
If a lawyer consistently makes stuff up on legal filings, in the worst cases they can lose their license (though they'll most likely end up getting fines).
If a doctor really sucks, they become uninsurable and ultimately could lose their medical license.
Devs that don't double check their work will cause havoc with the product and, not only will they earn low opinions from their colleges, they could face termination.
How many companies train on data that contains 'i don't know' responses. Have you ever talked with a toddler / young child? You need to explicitly teach children to not bull shit. At least I needed to teach mine.
Never mind toddlers, have you ever hired people? A far smaller proportion of professional adults will say “I don’t know” than a lot of people here seem to believe.
No I call judgement a logical process of assessment.
You have an amount of material that speaks of the endeavours in some sport of some "Michael Jordan", the logic in the system decides that if a "Michael Jordan" in context can be construed to be "that" "Michael Jordan" then there will be sound probabilities he is a sportsman; you have very little material about a "John R. Brickabracker", the logic in the system decides that the material is insufficient to take a good guess.
Then I expect your personal fortunes are tied up in hyping the "generative AI are just like people!" meme. Your comment is wholly detached from the reality of using LLMs. I do not expect we'll be able to meet eye-to-eye on the topic.
This exists, each next token has a probability assigned to it. High probability means "it knows", if there's two or more tokens of similar probability, or the prob of the first token is low in general, then you are less confident about that datum.
Of course there's areas where there's more than one possible answer, but both possibilities are very consistent. I feel LLMs (chatgpt) do this fine.
Also can we stop pretending with the generic name for ChatGPT? It's like calling Viagra sildenafil instead of viagra, cut it out, there's the real deal and there's imitations.
> low in general, then you are less confident about that datum
It’s very rarely clear or explicit enough when that’s the case. Which makes sense considering that the LLMs themselves do not know the actual probabilities
Maybe this wasn't clear, but the Probabilities are a low level variable that may not be exposed in the UI, it IS exposed through API as logprobs in the ChatGPT api. And of course if you have binary access like with a LLama LLM you may have even deeper access to this p variable
> it IS exposed through API as logprobs in the ChatGPT api
Sure but they often are not necessarily easily interpretable or reliable.
You can use it to compare a model’s confidence of several different answers to the same question but anything else gets complicated and not necessarily that useful.
This is very subjective, but I feel they are all imitators of ChatGPT. I also contend that the ChatGPT API (and UI) will or has become a de facto standard in the same manner that intel's 80886 Instruction set evolved into x86
would you rather the LLM make up something that sounds right when it doesn't know, or would you like it to claim "i don't know" for tasks it actually can figure out? because presumably both happen at some rate, and if it hallucinates an answer i can at least check what that answer is or accept it with a grain of salt.
nobody freaks out when humans make mistakes, but we assume our nascent AIs, being machines, should always function correctly all the time
> would you rather the LLM make up something that sounds right when it doesn't know, or would you like it to claim "i don't know" for tasks it actually can figure out?
And that's part of the problem - you're thinking of it like a hammer when it's not a hammer. It's asking someone at a bar a question. You'll often get an answer - but even if they respond confidently that doesn't make it correct. The problem is people assuming things are fact because "someone at a bar told them." That's not much better than, "it must be true I saw it on TV".
It's a different type of tool - a person has to treat it that way.
Asking a question is very contextual. I don't ask a lawyer house engineering problems, nor my doctor how to bake cake. That means If I'm asking someone at a bar, I'm already prepare to deal with the fact that the person is maybe drunk, probably won't know,... And more often than not, I won't even ask the question unless dire needs. Because it's the most inefficient way to get an informed answer.
I wouldn't bat an eye if people were taking code suggestions, then review it and edit it to make it correct. But from what I see, it's pretty a direct push to production if they got it to compile, which is different from correct.
Still the elephant in the room. We need an AI technology that can output "don't know" when appropriate. How's that coming along?