A good literary production. I would have been proud of it had I thought of it, but it's a path to observe a strong "whataboutery" element that if we use "stochastic parrot" as shorthand and you dislike the term, now you understand why we dislike the constant use of "infer", "reason" and "hallucinate"
Parrots are self aware, complex reasoning brains which can solve problems in geometry, tell lies, and act socially or asocially. They also have complex vocal chords and can perform mimicry. Very few aspects of a parrots behaviour are stochastic but that also underplays how complex stochastic systems can be in their production. If we label LLM products as Stochastic Parrots it does not mean they like cuttlefish bones or are demonstrably modelled by Markov chains like Mark V Shaney.
Well parrots can make more parrots, LLMs can't make their own GPUs. So parrots win, but LLMs can interpolate and even extrapolate a little, have you ever heard a parrot do translation, hearing you say something in English and translating it to Spanish? Yes, LLMs are not parrots. Besides their debatable abilities, they work with human in the loop, which means humans push them outside their original distribution. That's not a parroting act, being able to do more than pattern matching and reproduction.
I don't like wading into this debate when semantics are very personal/subjective. But to me, it seems like almost a sleight of hand to add the stochastic part, when actually they're possibly weighted more on the parrot part. Parrots are much more concrete, whereas the term LLM could refer to the general architecture.
The question to me seems: If we expand on this architecture (in some direction, compute, size etc.), will we get something much more powerful? Whereas if you give nature more time to iterate on the parrot, you'd probably still end up with a parrot.
There's a giant impedance mismatch here (time scaling being one). Unless people want to think of parrots being a subset of all animals, and so 'stochastic animal' is what they mean. But then it's really the difference of 'stochastic human' and 'human'. And I don't think people really want to face that particular distinction.
"Expand the architecture" .. "get something much more powerful" .. "more dilithium crystals, captain"
Like I said elsewhere in this overall thread, we've been here before. Yes, you do see improvements in larger datasets, weighted models over more inputs. I suggest, I guess I believe (to be more honest) that no amount of "bigger" here will magically produce AGI simply because of the scale effect.
There is no theory behind "more" and that means there is no constructed sense of why, and the absence of abstract inductive reasoning continues to say to me, this stuff isn't making a qualitative leap into emergent anything.
It's just better at being an LLM. Even "show your working " is pointing to complex causal chains, not actual inductive reasoning as I see it.
And that's actually a really honest answer. Whereas someone of the opposite opinion might be like parroting in the general copying-template sense actually generalizes to all observable behaviours because templating systems can be turing-complete or something like that. It's templates-all-the-way-down, including complex induction as long as there is a meta-template to match on its symptoms it can be chained on.
Induction is a hard problem, but humans can skip infinite compute time (I don't think we have any reason to believe humans have infinite compute) and still give valid answers. Because there's some (meta)-structure to be exploited.
Architecturally if machines / NN can exploit this same structure is a truer question.
> this stuff isn't making a qualitative leap into emergent anything.
The magical missing ingredient here is search. AlphaZero used search to surpass humans, and the whole Alpha family from DeepMind is surprisingly strong, but narrowly targeted. The AlphaProof model uses LLMs and LEAN to solve hard math problems. The same problem solving CoT data is being used by current reasoning models and they have much better results. The missing piece was search.
I'm sure both of you know this, but "stochastic parrot" refers to the title of a research article that contained a particular argument about LLM limitations that had very little to do with parrots.
Parrots are self aware, complex reasoning brains which can solve problems in geometry, tell lies, and act socially or asocially. They also have complex vocal chords and can perform mimicry. Very few aspects of a parrots behaviour are stochastic but that also underplays how complex stochastic systems can be in their production. If we label LLM products as Stochastic Parrots it does not mean they like cuttlefish bones or are demonstrably modelled by Markov chains like Mark V Shaney.