Every program is a compressed representation of its output. This is from Kolmogo...

Every program is a compressed representation of its output. This is from Kolmogorov complexity, which you learn this in any CS complexity theory course.

So, a neural network being a compressor/decompressor is nothing special.

Note, however, that supposing a context window of 1000 units, then we are looking at K = 2^1000 = 10^300 different entries in the truth table. Somehow, your LLM neural network is the result of compressing a 10^300 exponential scale amount of possible information, which of course could never be seen at all -- to compress a JPEG at least you have access to the original image, not just two pixels in it.

Anyways, the philosophical debate is whether you believe programs can think, whether machine intelligence is meaningful at all by definition. Some say yes, others say no. When humans think, are not our abstractions and ideas a kind of compression?