Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> It also isn't generating "the most likely response" - that's what original GPT-3 did, GPT-3.5 and up don't work that way.

What changed?



It answers questions in a voice that isn't yours.

The "most likely response" to text you wrote is: more text you wrote. Anytime the model provides an output you yourself wouldn't write, it isn't "the most likely response".


I believe that ChatGPT works by inserting some ANSWER_TOKEN, that is a prompt like "Tell me about cats" would probably produce "Tell me about cats because I like them a lot", but the interface wraps you prompt like "QUESTOION_TOKENL:Tell me about cats ANSWER_TOKEN:"


It might, but I've used text-davinci-003 before this (https://platform.openai.com/playground) and it really just works with whatever you give it.


text-davinci-003 has no trouble working as a chat bot: https://i.imgur.com/lCUcdm9.png (note that the poem lines it gave me should've been green, I don't know why they lost their highlight color)


It is interesting that the model seems unable to output the INPUT and OUTPUT tokens; I wonder if it learned behavior or an architectural constraint


Yeah, that's an interesting question I didn't consider actually. Why doesn't it just keep going? Why doesn't it generate an 'INPUT:' line?

It's certainly not that those tokens are hard coded. I tried a completely different format and with no prior instruction, and it works: https://i.imgur.com/ZIDb4vM.png (again, highlighting is broken. The LLM generated all the text after 'Alice:' for all lines except for the first one.)


Then I guess that it is learned behavior. It recognizes the shape of a conversation and it knows where it is supposed to stop.

It would be interesting to stretch this model, like asking it to continue a conversation between 4-5 people where the speaking order is not regular and the user is 2 people and the model is 3


meaning that it tends to continue your question?


Reinforcement learning w/ human feedback. What u guys are describing is the alignment problem


That’s just a supervised fine tuning method to skew outputs favorably. I’m working with it on biologics modeling using laboratory feedback, actually. The underlying inference structure is not changed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: