No of course not - they also learn whatever is necessary, and possible, in order to replicate those surface statistics (e.g. understanding of fairy tales, etc, as I noted).
However, you seem to be engaged in magical thinking and believe these models are learning things beyond their architectural limits. You appear to be star struck by what these models can do, and blind to what one can deduce - and SEE - they they are unable to do.
You've said a lot of things about LLM chess performance that is not true and can be easily shown to be not true. Literally evidence right there that shows the model learning the board state, rules, player skills etc.
And then you've tried to paper over being shown that with a conveniently vague and nonsensical, "says more about bla bla bla". No, you were wrong. Your model about this is wrong. It's that simple.
You start from your conclusions and work your way down from it. "pattern matching technique" ? Please. By all means, explain to all of us what this actually entails in a way we can test for it. Not just vague words.
An LLM will learn what it CAN (and needs to, to reduce the loss), but not what it CAN'T. How difficult is that to understand?!
Tracking probable board state given a sequence of moves (which don't even need to go all the way back to the start of the game!) is relatively simple to do, and doesn't require hundreds of sequential steps that are beyond the architecture of the model. It's just a matter of incrementally updating the current board state "hypothesis" per each new move (essentially: "a knight just moved to square X, so it must have moved away from some square a knight's move away from X that we believe currently contains a knight").
Ditto for estimating player ELO rating in order to predict appropriately good or bad moves. It's basically just a matter of how often the player makes the same move as other players of a given ELO rating in the training data. No need for hundreds of steps of sequential computation that are beyond the architecture of the model.
Doing an N-ply lookahead to reason about potential moves is a different story, but you want to ignore that and instead throw out a straw man "counter argument" about maintaining board state as if that somehow proves that the LLM can magically apply > N=layers of sequential reasoning to derive moves. Sorry, but this is precisely magical faith-based thinking "it can do X, so it can do Y" without any analysis of what it takes to do X and Y and why one is possible, and the other is not.
>An LLM will learn what it CAN (and needs to to reduce the loss), but not what it CAN'T. How difficult is that to understand?!
Right and the point is that you don't know what it CAN'T learn. You clearly don't quite understand this because you say stuff like this:
>Chess is a good example, since it's easy to understand. The generative process for world class chess (whether human, or for an engine) involves way more DEPTH (cf layers) of computation than the transformer has available to model it
and it's just baffling because:
1. Humans don't play chess anything like chess engines. They literally can't because the brain has iterative computation limits well below that of a computer. Most Grandmasters are only evaluating 5 to 6 moves deep on average.
You keep trying to make the point that because a Transformer architecturally has a depth limit for some trained model, a, it cannot reach human level.
But this is nonsensical for various reasons.
- Nobody is stopping you from just increasing N such that every GI problem we care about is covered.
- You have shown literally no evidence that the N even state of the art models posses today is insufficient to match human iterative compute ability.
GPT-4o instant shots arbitrary arithmetic more accurately than any human brain and that's supposedly something it's bad at.
You can clearly see it can reach world class chess play.
If you have some iterative computation benchmark that shows transformers zero shotting worse than an unaided human then feel free to show me.
I did not claim the state of the art was better at all forms of reasoning than all humans. I claimed the architecture isn't going to stop it from being so in the future but I guess constructing a strawman is always easier right ?
There are benchmarks that rightfully show the SOTA behind average human performance in other aspects of reasoning so why are you fumbling so much to demonstrate this with unaided iterative computation ? It's your biggest argument so I just thought you'd have something more substantial than "It's limited bro!"
You cannot even demonstrate this today nevermind some hypothetical scaled up model.
> so why are you fumbling so much to demonstrate this with unaided iterative computation
Well, you see, I've been a professional developer for the last 45 years, and often, gasp, think for long periods of time before coding, or even writing things down. "Look ma, no hands!".
I know this will come across as an excuse, but the thing is I assumed you were also vaguely famililar with things like software development, or other cases where human's think before acting, so I evidentially did a poor job of convincing you of this.
I also assumed (my bad!) that you would at least know some people who were semi-intelligent and wouldn't be hopelessly confused about farmers and chickens, but now I realize that was a mistake.
Really, it's all on me.
I know that "just add more rules", "make it bigger" didn't work for CYC, but maybe as you suggest "increase N" is all that's needed in the case of LLMs, because they are special. Really - that's genius! I should have thought of it myself.
I'm sure Sam is OK, but he'd still appreciate you letting him know he can forget about Q* and Strawberries and all that nonsense, and just "increase N"! So much simpler and cheaper rather than hiring thousands of developers to try to figure this out!
Maybe drop Yan LeCun a note too - tell him that the Turing Award committee are asshats, and that he is too, and that LLMs will get us all the way to AGI.
>Well, you see, I've been a professional developer for the last 45 years, and often, gasp, think for long periods of time before coding, or even writing things down. "Look ma, no hands!".
>I know this will come across as an excuse, but the thing is I assumed you were also vaguely famililar with things like software development, or other cases where human's think before acting, so I evidentially did a poor job of convincing you of this.
Really, you have the same train of thought for hours on end ?
When you finish even your supposed hours long spiel, do you just proceed to write every line of code that solves your problem just like that ? Or do you write and think some more ?
More importantly, are LLMs unable to produce the kind of code humans spend a train of thought on ?
>Maybe drop Yan LeCun a note too - tell him that the Turing Award committee are asshats, and that he is too, and that LLMs will get us all the way to AGI.
You know, the appeal to authority fallacy is shifty at the best of times but it's straight up nonsensical when said authority does not have consensus on what you're appealing to.
Like great you mentioned LeCun. And I can just as easily bring in Hinton, Norvig, Ilya. Now what ?
However, you seem to be engaged in magical thinking and believe these models are learning things beyond their architectural limits. You appear to be star struck by what these models can do, and blind to what one can deduce - and SEE - they they are unable to do.