This is a SaaS problem, not a LLM problem. If you have a local LLM that nobody is upgrading behind your back, it will calculate the same thing on the same inputs.
Unless there is a bug somewhere, like using uninitialized memory, the flaoting-point calculations and the token embedding and all the rest do the same thing each time.
So could SaaS LLM or cloud/api LLMs not offer this as an option? A guarantee that the "same prompt" will always produce the same result.
Also the way I usually interpret this "non-deterministic" a bit "broader".
Say i have have slightly different prompts "what's 2+2?" vs. "can you please tell me what's 2 plus 2" or even "2+2=?" or "2+2" for most applications it would be useful if they all produce the same result
The form of the question determines the form of the outcome, even if the answer is the same. Asking the same question in a different way should result in the adherence to the form of the question.
2+2 is 4
2 plus 2 is 4
4=2+2
4
Having the LLM pass the input to a tool (python) will result in deterministic output.
Well I do because not a day has passed since 2021 where the general popular discourse on the subject of AI has not referenced it's functionality as fundamentally novel
There are two additional aspects that are even more critical than the implementation details here:
- Typical LLM usage involves the accretion of context tokens from previous conversation turns. The likelihood that you will type prompt A twice but all of your previous context will be the same is low. You could reset the context, but accretion of context is often considered a feature of LLM interaction.
- Maybe more importantly, because the LLM abstraction is statistical, getting the correct output for e.g. "3 + 5 = ?" does not guarantee you will get the correct output for any other pair of numbers, even if all of the outputs are invariant and deterministic. So even if the individual prompt + output relationship is deterministic, the usefulness of the model output may "feel" nondeterministic between inputs, or have many of the same bad effects as nondeterminism. For the article's list of characteristics of deterministic systems, per-input determinism only solves "caching", and leaves "testing", "compliance", and "debuggability" largely unsolved.
There may be something I do not understand about LLMs. But it seems it is more correct to say LLMs are chaotic - in the mathematical sense of sensitive dependence on initial conditions.
The only actual nondeterminism is deliberately injected. E.g. the temperature parameter. Without that, it is deterministic but chaotic. This is the case both in training LLMs, and in using the trained models.
If I missed something, someone point it out please.
If that is the case, then you didn't read or comprehend what was actually said, and no one can tailor a response to people who can't read and comprehend.
There are important distinctions, its beyond the scope for me to try and guess at where that failure of comprehension might be for an individual such as yourself.
Basic reading comprehension would note:
Properties are not individual inputs, they apply to the whole system as a relationship between input and output, individual inputs cannot define properties.
"Chaos" has a very rigorous definition (changes in small inputs lead to large changes in outputs).
"Injection of non-determinism" is only correct if it included a reference that determinism is built-in to all computation which is not a common understanding. Without that reference, the context improperly includes an indeterminable indirection resulting in fallacy.
The two are unrelated and independent to the context of the conversation or determinism, and so defining such understanding in those terms would result in fallacy (by improper isolation), delusion, or hallucination.
These are fundamental errors in reasoning and by extension understanding.
The correct, on firm foundations understanding, was provided. It is on the individual without knowledge to come into a conversation with the bare minimum requirements for comprehension based in rational thought and practice.
Edit: No amount of down-voting will change the truth of this, though I understand why someone would want useful knowledge to be hidden.
You should honestly re-evaluate and re-calibrate your measure of tone in moderation and relation to everything else.
Terse is not harsh or rude, its condensed, which carries a fine distinction.
Most business people and professionals speak this way; especially when it comes down to the objective facts which are not in question.
The facts and the effort towards minimization of cost for all parties in a communication conveys a overall respect, its extra effort I didn't have to provide which gets towards a specific goal as a whole for everyone involved in the communication's benefit.
If there is a mistake made on either parties part, its not harsh or rude to point out the mistake in such unambiguous format, or where that's not possible due to a deficit to point out why generally (such as a dependency not met).
Elaborating in great detail repeated or otherwise would be condescending, on the opposite side personal haranguing would be coercive imposition of cost. Lying by omission or commission would be the worst.
You'll note I did neither of these things, which is the socially acceptable way to handle it, and does not merit actions that were done. I pointed out the errors in comprehension, in the most minimal unambiguous way possible.
The only generally understood acceptable middle-ground in those two extremes is terse and to the point, and when you eliminate both sides and the middle ground, you classify all communication as harsh and rude which is an absurdity.
People cannot read other people's minds, and the point of communication is to convey meaning in a way the parties involved can use it towards their own ends beneficially if they choose, without unnecessary third-party interference.
The reflected appraisal is beneficial to all people involved.
> Terse is not harsh or rude, its condensed, which carries a fine distinction.
Ironically, I suspect that terseness would improve your communication here. The significant repetitiveness and length of your posts is both contributing to others’ confusion and giving you more opportunities to be rude.
I’m commenting as opposed to downvoting and moving on because I do think there’s some interesting substance in what you wrote, but it needs an edit pass—for politeness/assumption of positive regard as well as brevity—before it’s in any way useful communication.
> but it needs an edit pass - for politeness/assumption of positive regard.
We will have to disagree.
I have appreciated that, though the post isn't repetitive, each point has a fine and different nuance its meant to convey, and a single tie-in back to the overarching point/theme at the end signaling the completion of the idea; this is a very common writing style/structure when conveying information.
The problem with doing as you mention is that any further reduction would introduce ambiguity, and fragmentary thought, through improper generalization/isolation. These would then be latched upon towards subtle harassment attacks, like nullification, which have become all too popular on all platforms today.
You'll notice the people I responded to don't provide reflected appraisal, as someone earnest would, and that is needed to write or tailor responses towards a common audience level, or bridge the comprehensive gap. This is a failing on their part, not mine.
They are in all likelihood either bots seeking to remove useful information, or are doing so purposefully following a critical theory mindset which are a particularly destructive engrammed set of trauma loops, though a rare few manage to crawl and pull themselves out of it towards actual reasoning when the mistakes in reasoning are brought to light.
Politeness and positive regard aren't what you seem to think. Washington wrote about politeness, and aside from the dated material, it still holds up today.
You will find much in his 110 Rules of Civility which constitute politeness which are present in my writings.
The 'disarmed politeness' you seem to want is based in an impossibility when you strip all the indirection and contradiction away. The brevity you seem to want ignores the fine nuance/comprehensiveness needed to be polite, and the resulting outcome of doing either naturally leads to "lies to children", and the imposition of harassment for volunteering useful information, something I won't do disarmed. An impasse.
Charity is provided and given solely on the terms of the giver, and not at the barrel of a gun, blackmail, or inherent threat thereof and volunteers stop giving when it costs them more than they were willing to give, and its an individual decision.
I'll have to remember Mathew 7:6 when next I consider providing such charity, though I'm glad and appreciate you letting me know you found the substance useful. So few people do this and it is appreciated.
Probabilistic processes are not the most appropriate way to produce deterministic results. And definitely not if the system is designed to update, grow or "learn" from inputs.
The author read the docs but never experimented, so they don't seem to have intuition behind the theory. For example, Gemini Flash actually seems to have deterministic outputs at temp 0, despite the disclaimer in the docs. Clearly Google has no trouble making it possible. Why don't they guarantee it, then? For starters it's inconvenient due to batching, you can see that in Gemini Pro which is "almost" deterministic but the same results are grouped together. It's a SaaS problem, if you run a model locally it's much easier to make it deterministic than presented in the article, and definitely not nearly impossible. It's going to cost you more, though.
But largely, you don't really want determinism. Imagine you have equal logprobs for "yes" and "no", which one should go into the output? With temperature 0 and greedy sampling it's going to be the same each time, depending on unrelated factors (e.g. vocabulary order), and your outputs are going to be terribly skewed from what the model actually tries to tell you in the output distribution. What you're trying to solve with LLMs is inherently non-deterministic. It's either the same with humans and organizations (but you can't reset the state to measure it), or at least it depends on a myriad of little factors impossible to account for.
Besides, all current models have issues at temperature 0. Gemini in particular exhibits micro-repetitions and hallucinations (non-existent at higher temps) which it then tries to correct. Other models have other issues. This is a training-time problem, probably unsolvable at this point.
What you want is correctness, which is pretty different because the model works with concepts, not tokens. Try asking it what is 2x2. It might formulate the answer differently each time but good luck making it reply with anything else than 4 on a non-schizophrenic temperature. A bit of randomness won't prevent it from being consistently correct (or consistently incorrect).
reply