I share this idea but from the different perspective. It doesn’t develop these mechanisms, but casts a high-dimensional-enough shadow of their effect on itself. This vaguely explains why the more deep Gell-Mann-wise you are the less sharp that shadow is, because specificity cuts off “reasoning” hyperplanes.
It’s hard to explain emerging mechanisms because of the nature of generation, which is one-pass sequential matrix reduction. I say this while waving my hands, but listen. Reasoning is similar to Turing complete algorithms, and what LLMs can become through training is similar to limited pushdown automata at best. I think this is a good conceptual handle for it.
“Line of thought” is an interesting way to loop the process back, but it doesn’t show that much improvement, afaiu, and still is finite.
Otoh, a chess player takes as much time and “loops” as they need to get the result (ignoring competitive time limits).
It’s hard to explain emerging mechanisms because of the nature of generation, which is one-pass sequential matrix reduction. I say this while waving my hands, but listen. Reasoning is similar to Turing complete algorithms, and what LLMs can become through training is similar to limited pushdown automata at best. I think this is a good conceptual handle for it.
“Line of thought” is an interesting way to loop the process back, but it doesn’t show that much improvement, afaiu, and still is finite.
Otoh, a chess player takes as much time and “loops” as they need to get the result (ignoring competitive time limits).