Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

At GPT-4 level, saying "All it does is pick the most likely next token, given the ones that have come before" is like saying all people do is actions that maximize their wealth, based on what they saw worked best before. Technically close enough, but also missing the point.

Yes, GPT-4 is predicting next tokens best associatied with ones that came before. It does so based not on one score, but a hundred thousand ones. The extreme amount of dimensions of the latent space are enough to capture pretty much any kind of semantic or structural association you can come up with. I'm not convinced that this isn't sufficient in principle to cover most of what we'd call reasoning.



No matter how many parameters you give it, it’s still just predicting the next token is most likely, given the training data. The number of parameters is what gives it the ability to pattern match a wide variety of inputs and generate acceptable outputs, but it doesn’t give it any reasoning ability. It can indeed capture textual associations, like the fact that the sequence of tokens “mingle the bits of two 16-bit numbers” is commonly associated with INTERCAL. It just has no reasoning ability, so it cannot recognize that it has incorrectly associated it with the % operator. It just didn’t have anything with a higher probability.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: