At GPT-4 level, saying "All it does is pick the most likely next token, given th...

db48x · on June 25, 2023

No matter how many parameters you give it, it’s still just predicting the next token is most likely, given the training data. The number of parameters is what gives it the ability to pattern match a wide variety of inputs and generate acceptable outputs, but it doesn’t give it any reasoning ability. It can indeed capture textual associations, like the fact that the sequence of tokens “mingle the bits of two 16-bit numbers” is commonly associated with INTERCAL. It just has no reasoning ability, so it cannot recognize that it has incorrectly associated it with the % operator. It just didn’t have anything with a higher probability.