I think there is a critical aspect of human visual learning which machine leanring cant replicate because it is prohibitively expensive. When we look at things as children we are not just looking at a single snapshot. When you stare at an object for a few seconds you have practically injested hundreds of slightly variated images of that object. This gets even more interesting when you take into account real world is moving all the time, so you are seeing so many things from so many angles. This is simply undoable with compute.
Then explain blind children? Or blind & deaf children? There's obviously some role senses play in development but there's clearly capabilities at play here that are drastically more efficient and powerful than what we have with modern transformers. While humans learn through example, they clearly need a lot fewer examples to generalize off of and reason against.
> Then explain blind children
I was only talking about vision tasks as an example. You can extend the idea to any sense.
> While humans learn through example, they clearly need a lot fewer examples to generalize off of and reason against.
Human brain has been developing over millenia. machines start from zero. What if this few example learning is just an emergent capbaility of any
"leanring function" given enough compute and training.
I think my point is that communication is the biggest contributor to brain development more than anything and communication is what powers our learning. Effective learners learn to communicate more with themselves and to communicate virtually with past authors through literature. That isn’t how LLMs work. Not sure why that would be considered objectionable. LLMs are great but we don’t have to pretend like they’re actually how brains work. They’re a decent approximation for neurons on today’s silicon - useful but nowhere near the efficiency and power of wetware.
Also as for touch, you’re going to have a hard time convincing me that the amount of data from touch rivals the amount of content on the internet or that you just learn about mistakes one example at a time.
There are so many points to consider here im not sure i can address them all.
- Airplanes dont have wings like birds but can fly. and in some ways are superior to birds. (some ways not)
- Human brains may be doing some analogue of sample augmentation which gives you some multiple more equivalent samples of data to train on per real input state of environment. This is done for ml too.
- Whether that input data is text, or embodied is sort of irrelevant to cognition in general, but may be necessary for solving problems in a particular domain. (text only vs sight vs blind)
> Airplanes dont have wings like birds but can fly. and in some ways are superior to birds. (some ways not)
I think you're saying exactly what I'm saying. Human brains work differently from LLMs and the OP comment that started this thread is claiming that they work very similarly. In some ways they do but there's very clear differences and while clarifying examples in the training set can improve human understanding and performance, it's pretty clear we're doing something beyond that - just from a power efficiency perspective humans consume far less energy for significantly more performance and it's pretty likely we need less training data.
to be honest i dont really care if they work the same or not. I just like that they do work and find it interesting.
i dont even think peoples brains work the same as eachother. half of people cant even visually imagine an apple.
Neural networks seem to notice and remember very small details, as if they have access to signals from early layers. Humans often miss the minor details. Theres probably a lot more signal normalization happening. That limits calorie usage and artifacts the features.
I dont think that this is necessarily a property neural networks cant have. I think it could be engineered in. For now though seems like were making a lot of progress even without efficiency constraints so nobody cares.