> It also isn't generating "the most likely response" - that's what original GPT...

astrange · on March 15, 2023

It answers questions in a voice that isn't yours.

The "most likely response" to text you wrote is: more text you wrote. Anytime the model provides an output you yourself wouldn't write, it isn't "the most likely response".

afiori · on March 15, 2023

I believe that ChatGPT works by inserting some ANSWER_TOKEN, that is a prompt like "Tell me about cats" would probably produce "Tell me about cats because I like them a lot", but the interface wraps you prompt like "QUESTOION_TOKENL:Tell me about cats ANSWER_TOKEN:"

astrange · on March 15, 2023

It might, but I've used text-davinci-003 before this (https://platform.openai.com/playground) and it really just works with whatever you give it.

mort96 · on March 15, 2023

text-davinci-003 has no trouble working as a chat bot: https://i.imgur.com/lCUcdm9.png (note that the poem lines it gave me should've been green, I don't know why they lost their highlight color)

afiori · on March 15, 2023

It is interesting that the model seems unable to output the INPUT and OUTPUT tokens; I wonder if it learned behavior or an architectural constraint

mort96 · on March 15, 2023

Yeah, that's an interesting question I didn't consider actually. Why doesn't it just keep going? Why doesn't it generate an 'INPUT:' line?

It's certainly not that those tokens are hard coded. I tried a completely different format and with no prior instruction, and it works: https://i.imgur.com/ZIDb4vM.png (again, highlighting is broken. The LLM generated all the text after 'Alice:' for all lines except for the first one.)

afiori · on March 17, 2023

Then I guess that it is learned behavior. It recognizes the shape of a conversation and it knows where it is supposed to stop.

It would be interesting to stretch this model, like asking it to continue a conversation between 4-5 people where the speaking order is not regular and the user is 2 people and the model is 3

afiori · on March 15, 2023

meaning that it tends to continue your question?

meow_mix · on March 14, 2023

Reinforcement learning w/ human feedback. What u guys are describing is the alignment problem

mistymountains · on March 14, 2023

That’s just a supervised fine tuning method to skew outputs favorably. I’m working with it on biologics modeling using laboratory feedback, actually. The underlying inference structure is not changed.