"39. Re graphics: A picture is worth 10K words - but only those to describe the picture. Hardly any sets of 10K words can be adequately described with pictures."
It's the pigeonhole principle; there are only a few long videos possibly encodable as short programs because there are only a few short programs in the first place. To get compression performance, one has to target an ever smaller subset of possible videos, which eventually starts becoming an AI-complete problem.
Sure. 2^4096 is 10^1233. Let's just look at dialogue. Even if you limit yourself to boring 5-word sentences with 2,000 possible words for each position (subject verb preposition adjective object), 5^2000 = 8.7 * 10^1397 which means in the very first sentence you've got 10^164 times as many videos as you could possibly index with only 4096 bits.
Late addition: I thought I fixed all the stupid math problems before I posted this, but it's still totally wrong. Even leaving aside the fact the English doesn't have 2,000 prepositions, which I just glossed over :)
A five-word sentence with 1000 options per word isn't 5^1000 but only 1000^5 = 10^15. If we break the movie into 5-second blocks we get 48 of them in a 4-minute movie so (10^15)^48 = 10^720 different movies, which is not bad but we're still 10^513 away. There are a lot more variations we could consider - different actors, costumes, sets, framing, color grading etc. and I think it's plausible that we could come up with enough features. Heck if you talk twice as fast, you could get (10^15)^(48*2) = 10^1440. But it's a lot bigger than I made it out to be.
Obviously it would be AI-complete. I didn't know that term, that's what I meant by currently impossible to code. I just learned my favorite term ever, thanks for that!.
Although disappointing, you seem to have the correct answer for my question.
It's the pigeonhole principle; there are only a few long videos possibly encodable as short programs because there are only a few short programs in the first place. To get compression performance, one has to target an ever smaller subset of possible videos, which eventually starts becoming an AI-complete problem.