Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>: humans do not need 10,000 examples to tell the difference between cats and dogs

well, maybe. We view things in three dimensions at high fidelity: viewing a single dog or cat actually ends up being thousands of training samples, no?



Yes, but we do not call a couch in a leopard print a leopard. Because we understand that the print is secondary to the function.


Hah. My toddler gladly calls her former walking aid toy a "lawn mower". Random toys become pie and cakes she brings to us to eat.


I'm not sure it's as simple as you say. The first time my very young son saw a horse, he made the ASL sign for 'dog'.

He had only ever seen cats and dogs in his life previous to that.


Did he require 9,999 more examples of horses before learning the difference?


In another comment I replied that 3D high fidelity images do end up being thousands of training samples, so the answer is yes.


I'm deeply skeptical that training AI on (effectively) thousands of images of one horse will perform very well at training to recognize horses in general.


I'll double down with you on this.

Then train the AI using a binaural video of a thoroughbred and see if it can distinguish a draft horse and a quarter horse as horse...


Are you suggesting that if a group of kids were given a book of zoo animals before going to the zoo, they would have difficulties identifing any new animals, because they only have seen one picture of each?


I think that's an interesting question, and a possible counter to my argument.

Certainly kids learn and become better at extrapolation and need fewer and fewer samples in general as they get more life experience.


But we have a lot more sensory input and context to verify all of that.

If you kept training LLMs with all that data, it would be interesting to see what the results would be.


Eh, still doesn’t hold up. I really don’t think there’s many psychologists working on the posited mechanism of simple NN-like backprop learning. Aka conditioning, I guess. As Chomsky reminds us every time we let him: human children learn to understand and use language — an incredibly complex and nuanced domain, to say the least — with shockingly little data and often zero-to-none intentional instruction. We definitely employ principles and patterns that are far more complex (more “emergent”?) than linear regression.

Tho I only ever did undergrad stats, maybe ML isn’t even technically a linear regression at this point. Still, hopefully my gist is clear


>human children learn to understand and use language — an incredibly complex and nuanced domain, to say the least — with shockingly little data and often zero-to-none intentional instruction

This isn't accurate comparison imo, because we're mapping language to a world model which was built through a ton of trial and error.

Children aren't understanding language at six months old, there seems to be a minimum amount of experience with physics and the world before language can click for them.


> Chomsky reminds us every time we let him: human children learn to understand and use language — an incredibly complex and nuanced domain, to say the least — with shockingly little data and often zero-to-none intentional instruction.

Chomsky's arguments about "poverty of the stimulus" rely on using non-probabistic grammars. Norvig discusses this here: https://norvig.com/chomsky.html

> In 1967, Gold's Theorem showed some theoretical limitations of logical deduction on formal mathematical languages. But this result has nothing to do with the task faced by learners of natural language. In any event, by 1969 we knew that probabilistic inference (over probabilistic context-free grammars) is not subject to those limitations (Horning showed that learning of PCFGs is possible).

If I recall correctly, human toddlers hear about 3-13 million spoken words per year, and the higher ranges are correlated with better performance in school. Which:

- Is a lot, in an absolute sense.

- But is still much less training data than LLMs require.

Adult learners moving between English and romance languages can get a pretty decent grasp of the language (C1 or C2 reading ability) with about 3 million words of reading. Which is obviously exploiting transfer learning and prior knowledge, because it's harder in a less related language.

So yeah, humans are impressive. But Chomsky doesn't really seem to have the theoretical toolkit to deal with probabilistic or statistical learning. And LLMs are closer to statistical learning than to Chomsky's formal models.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: