> It's not an unreasonable definition, if you are aware of Kolmogorov complexity and Solomonoff induction
I was already familiar with Kolmogorov Complexity, but not _Solomonoff's theory of inductive inference_ - I skimmed the Wikipedia article just now and I think/hope I get the general gist of it - so thank you for that.
> Intelligence is intimately connected to the ability to predict and model things
I agree with that statement - but I am unsure how this relates to other aspects of intelligence - or even other notions of intelligence entirely. If we're redefining exactly what the word "Intelligence" means aren't we just moving the goalposts?
...so I'm unsure if the word "Intelligence" in your post can/should be read as a layperson would understand it, or if it's a specific, well-defined term in this field? (Like how "complexity" in algorithms does not refer to how convoluted and unmaintainable a program is, which is what a layperson would think of when an SWE person says "this program has high computational complexity" to them).
> and it turns out that data compression is _also_ connected to the ability of predict and model things.
This is where I have difficulty following your argument:
As a preface: my academic understanding of data compression only covers things like algorithms in the LZ family, Entropy coding, and (lossy and lossless) domain-specific compression schemes (like JPEG/DCT, MP3/FLAC, etc). I am aware of very interesting results coming from present-day AI approaches, like using LLMs fed only on compressed data - but these are completely outside my scope of understanding (and LLMs are still very spooky to me).
Does it matter if a scheme is lossless or lossy? In a lossless system, surely a data compression system needs to be deterministic and mechanical? If so, what room is there for "intelligence" in a rigid, statically-defined, system?
Take entropy coding for example - or even simpler probability-based compression schemes like Morse Code (e.g. "E" has a high probability, so it has a very short symbol length). I just can't see the connection from things like entropy-encoding to "modelling": supposing I have a large file (CSV?) of meterological data for a specific region over time - if it's plaintext then I expect I can compress it better using gzip than by using a system that can (somehow!) identify some structure to the weather sensor data (the "model", right?) and then use that as the basis for a better domain-specific compression - but doing this means having to add additional metadata to the source data to describe the structure that the system identified, and then hope that this approach is better than a comparatively "dumb" approach like DEFLATE - and even then, assuming that employing that "model" really does result in smaller compressed output, how is that an example of the system having a general "intelligence"?
I was thinking of lossless compression, lossy is another can of worms on top of lossless. Lossless compression works, in principle, by the realization that the probability distribution of the all possible input strings is not flat. There are some strings that are more probable, and strings that are less probable. The reason some strings are more probable is that they result not from random processes, but there is some process or algorithm that generate them. Thus, they have inherent structure to them. A random string would be a string that has no structure detectable by any algorithm smaller than the string itself. If the probability distribution of the input strings is not flat, then we can use entropy coding to describe the common case - non-random data – in less bits than the input.
However, the difference between some specific compression algorithms and a "general compression algorithm" is the assumptions they do. Most compression algorithms don't consider the probability distribution of the "full" set of input strings, but they rather divide the input string into a predictable sized chunks, and consider the distribution of those chunks. This is way simpler, and yields to being able to compress somewhat well, while still having rather "static" distribution (like morse code), or only simple algorithms (like adaptive Huffman coding) to adapt the distribution to the input data.
But if we don't restrict ourselves to the world of "compressing a stream, message by message", but enable "intelligent" compression that is allowed to use any computable means to achieve smaller compressed size and can adapt to the whole context of the input, we can see that "message-by-message" entropy coding is only a subset of what we can do. (And we of course _also_ have that subset in our toolbox.) But the true challenge of the compression now evolves into being able to find and represent approximations to the "full" distribution of the input data. That involves things like identifying structure in weather data! If the input is large enough to start with, the more complex model might be worth it. And if it isn't then we can intelligently just decide to use DEFLATE.
> what room is there for "intelligence" in a rigid, statically-defined, system?
But as we can see from above, surely intelligence that is tasked with compressing stuff, doesn't need to be "rigid"? The intelligence is in the compressor, not in the compressed result. The compressed result might be assembly code that unpacks itself by doing arbitrary computations, and to achieve good results, the generator of that code must be sufficiently intelligent.
I was already familiar with Kolmogorov Complexity, but not _Solomonoff's theory of inductive inference_ - I skimmed the Wikipedia article just now and I think/hope I get the general gist of it - so thank you for that.
> Intelligence is intimately connected to the ability to predict and model things
I agree with that statement - but I am unsure how this relates to other aspects of intelligence - or even other notions of intelligence entirely. If we're redefining exactly what the word "Intelligence" means aren't we just moving the goalposts?
...so I'm unsure if the word "Intelligence" in your post can/should be read as a layperson would understand it, or if it's a specific, well-defined term in this field? (Like how "complexity" in algorithms does not refer to how convoluted and unmaintainable a program is, which is what a layperson would think of when an SWE person says "this program has high computational complexity" to them).
> and it turns out that data compression is _also_ connected to the ability of predict and model things.
This is where I have difficulty following your argument:
As a preface: my academic understanding of data compression only covers things like algorithms in the LZ family, Entropy coding, and (lossy and lossless) domain-specific compression schemes (like JPEG/DCT, MP3/FLAC, etc). I am aware of very interesting results coming from present-day AI approaches, like using LLMs fed only on compressed data - but these are completely outside my scope of understanding (and LLMs are still very spooky to me).
Does it matter if a scheme is lossless or lossy? In a lossless system, surely a data compression system needs to be deterministic and mechanical? If so, what room is there for "intelligence" in a rigid, statically-defined, system?
Take entropy coding for example - or even simpler probability-based compression schemes like Morse Code (e.g. "E" has a high probability, so it has a very short symbol length). I just can't see the connection from things like entropy-encoding to "modelling": supposing I have a large file (CSV?) of meterological data for a specific region over time - if it's plaintext then I expect I can compress it better using gzip than by using a system that can (somehow!) identify some structure to the weather sensor data (the "model", right?) and then use that as the basis for a better domain-specific compression - but doing this means having to add additional metadata to the source data to describe the structure that the system identified, and then hope that this approach is better than a comparatively "dumb" approach like DEFLATE - and even then, assuming that employing that "model" really does result in smaller compressed output, how is that an example of the system having a general "intelligence"?