The "most likely response" to text you wrote is: more text you wrote. Anytime the model provides an output you yourself wouldn't write, it isn't "the most likely response".
I believe that ChatGPT works by inserting some ANSWER_TOKEN, that is a prompt like "Tell me about cats" would probably produce "Tell me about cats because I like them a lot", but the interface wraps you prompt like "QUESTOION_TOKENL:Tell me about cats ANSWER_TOKEN:"
text-davinci-003 has no trouble working as a chat bot: https://i.imgur.com/lCUcdm9.png (note that the poem lines it gave me should've been green, I don't know why they lost their highlight color)
Yeah, that's an interesting question I didn't consider actually. Why doesn't it just keep going? Why doesn't it generate an 'INPUT:' line?
It's certainly not that those tokens are hard coded. I tried a completely different format and with no prior instruction, and it works: https://i.imgur.com/ZIDb4vM.png (again, highlighting is broken. The LLM generated all the text after 'Alice:' for all lines except for the first one.)
Then I guess that it is learned behavior. It recognizes the shape of a conversation and it knows where it is supposed to stop.
It would be interesting to stretch this model, like asking it to continue a conversation between 4-5 people where the speaking order is not regular and the user is 2 people and the model is 3
That’s just a supervised fine tuning method to skew outputs favorably. I’m working with it on biologics modeling using laboratory feedback, actually. The underlying inference structure is not changed.
What changed?