No, these NLPs aren't idempotent. Even if you ask ChatGPT the same question mult...

trifurcate · on March 15, 2023

None of the siblings are right. The models themselves are idempotent: given the same context you will get the same activations. However the output distribution is sampled in a pseudorandom way by these chat tools. You can seed all the prngs in the system to always have reproducible output using sampling, or even go beyond that and just work with the raw probability distribution by hand.

webmaven · on March 15, 2023

Right. They are idempotent (making an API call doesn't cause a state change in the model[0] per se), but not necessarily deterministic (and less so as you raise the temp).

It is possible to architect things to be fully deterministic with an explicit seed for the pseudorandom aspects (which is mostly how Stable Diffusion works), but I haven't yet seen a Chatbot UI implementation that works that way.

[0] Except on a longer timeframe where the request may be incorporated into future training data.

LawTalkingGuy · on March 14, 2023

That's the feature of chat - it remembers what has been said and that changes the context in which it says new things. If you use the API it starts fresh each time, and if you turn down the 'temperature' it produces very similar and identical answers.

parentheses · on March 15, 2023

This may be an implementation detail to obfuscate GPT weights. OR it was to encourage selecting the best answers to further train the model.

textninja · on March 15, 2023

Pseudo random numbers are injected into the models via its temperature settings, but OpenAI could seed that to get the same answers with the same input. I’m going out on a limb here with pure speculation but given the model, a temperature, and a known text prompt, OpenAI could probably reverse engineer a seed and prove that the weights are the same.

slt2021 · on March 15, 2023

fine-tuning original weights solves that, and any sane person would fine-tune for their task anyways to get better results

textninja · on March 15, 2023

Since fine-tuning is often done by freezing all but the top layers I wonder if it would still be possible to take a set of inputs and outputs and mathematically demonstrate that a model is derivative of ChatGPT. There may well be too much entropy to unpack, but I’m sure there will be researchers exploring this, if only to identify AI-generated material.

Of course, since the model is so large and general purpose already, I can’t assume the same fine-tuning techniques are used as for vastly smaller models, so maybe layers aren’t frozen at all.

outside1234 · on March 14, 2023

yes - they are multinomial distributions over answers essentially