I've come around to thinking of our modern "AI" as a lossy compression engine for knowledge. When you ask a question it is just decompressing a tiny portion of the knowledge and displaying it for you, sometimes with compression artifacts.
This is why I am not worried about the "AI Singularity" like some notable loudmouth technologists are. At least not with our current ML technologies.
That is exactly how I think about it. It’s lossy compression. Think about how many petabytes of actual information any of these LLMs were trained on. Now look at the size of the resultant model. Its orders of magnitude smaller. It made it smaller by clipping the high frequency bits of some multi-billion dimension graph of knowledge. Same basic you do with other compression algorithms like JPEG or MP3.
These LLM’s are just lossy compression for knowledge. I think the sooner that “idea” gets surfaced people will find ways to train models with fixed pre-computed lookup tables of knowledge categories and association properties… basically taking a lot of the randomness out of the training process and getting more precise about what dimensions of knowledge and facts are embedded into the model.
… or something like that. But I don’t think this optimization will be driven by the large well funded tech companies. They are too invested in flushing money down the drain with more and more compute. Their huge budget blind them to other ways of doing the same thing with significantly less.
The future won’t be massive large language models. They’ll be “small language models” custom tuned to specific tasks. You’ll download or train a model that has incredible understanding of Rust and Django but won’t know a single thing about plate tectonics or apple pie recipes.
Why wouldn't we have a small language model for python programming now though?
That is an obvious product. I would suspect the reason we don't have a small language python model is because the fine tuned model is no better than the giant general purpose model.
If that is the case it is not good. It even makes me wonder that we are not really compressing knowledge but a hack to create the illusion of compressing knowledge.
With a bit (OK, a lot) of reinforcement learning that prioritizes the best chains-of-thoughts, this compression engine becomes a generator of missing training data on how to actually think about something instead of trying to come up with the answer right away as internet text data suggests it should do.
That's the current ML technology. What you've described is the past. About 4 year old past to be precise.
If you think that "compression" somehow means "non-intelligent", consider this:
The best compression of data that is theoretically achievable (see Kolmogorov complexity) is an algorithm that approximates process that produces the data. And which process produces texts on the internet? Activity of the human brain. (I described it a bit sloppily. We are dealing with probability distribution of the data, not the data itself. But the general idea still holds.)
Using chain-of-thought removes the constraint that the output of the resultant algorithm should use fixed amount of compute per token.
This is why I am not worried about the "AI Singularity" like some notable loudmouth technologists are. At least not with our current ML technologies.