I've come around to thinking of our modern "AI" as a lossy compression engine fo...

cruffle_duffle · 2024-12-27T21:19:42 1735334382

That is exactly how I think about it. It’s lossy compression. Think about how many petabytes of actual information any of these LLMs were trained on. Now look at the size of the resultant model. Its orders of magnitude smaller. It made it smaller by clipping the high frequency bits of some multi-billion dimension graph of knowledge. Same basic you do with other compression algorithms like JPEG or MP3.

These LLM’s are just lossy compression for knowledge. I think the sooner that “idea” gets surfaced people will find ways to train models with fixed pre-computed lookup tables of knowledge categories and association properties… basically taking a lot of the randomness out of the training process and getting more precise about what dimensions of knowledge and facts are embedded into the model.

… or something like that. But I don’t think this optimization will be driven by the large well funded tech companies. They are too invested in flushing money down the drain with more and more compute. Their huge budget blind them to other ways of doing the same thing with significantly less.

The future won’t be massive large language models. They’ll be “small language models” custom tuned to specific tasks. You’ll download or train a model that has incredible understanding of Rust and Django but won’t know a single thing about plate tectonics or apple pie recipes.

wodderam · 2024-12-27T22:29:34 1735338574

Why wouldn't we have a small language model for python programming now though?

That is an obvious product. I would suspect the reason we don't have a small language python model is because the fine tuned model is no better than the giant general purpose model.

If that is the case it is not good. It even makes me wonder that we are not really compressing knowledge but a hack to create the illusion of compressing knowledge.

red75prime · 2024-12-27T21:16:57 1735334217

With a bit (OK, a lot) of reinforcement learning that prioritizes the best chains-of-thoughts, this compression engine becomes a generator of missing training data on how to actually think about something instead of trying to come up with the answer right away as internet text data suggests it should do.

That's the current ML technology. What you've described is the past. About 4 year old past to be precise.

jandrese · 2024-12-28T01:36:24 1735349784

That's just tweaking the constants on the compression algorithm.

red75prime · 2024-12-28T07:35:57 1735371357

If you think that "compression" somehow means "non-intelligent", consider this:

The best compression of data that is theoretically achievable (see Kolmogorov complexity) is an algorithm that approximates process that produces the data. And which process produces texts on the internet? Activity of the human brain. (I described it a bit sloppily. We are dealing with probability distribution of the data, not the data itself. But the general idea still holds.)

Using chain-of-thought removes the constraint that the output of the resultant algorithm should use fixed amount of compute per token.

Gud · 2024-12-27T18:15:24 1735323324

Absolutely agree.