Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Encrypted data is not very compressible. Compression algorithms exploit statistical regularities in the bitstream, whereas the whole point of encrypting something is to get a result with no statistical regularity whatsoever.

If you compress first and then encrypt, you can then get smaller file sizes than the input, because the compression algorithm can work on the plaintext.



And iOS doesn't do that?

Let me get this conversation straight.. Someone says that it's strange that android is so large. Someone else says iOS is big too. Then someone comes in to say that they are not large, they just suck at getting the compress/encrypt issue right. No one, other than obtino, thinks that it was simply a mistake that Xuzz put 'then' in his/her post?


It's compressed after encryption, so the effects of the compression is minimal. I believe that my post was correct: due to the effects of the encryption, the compression is almost useless, rendering the large iOS file size essentially equivalent to the size when installed to disk. That's all.


This is correct, and yet at the same time I believe it is incorrect. I am totally willing to believe I'm wrong here, though (as it has been two years since I actually did this process manually). I will explain. ;P

So, it is my understanding that an IPSW file is a ZIP archive containing a number of files, the largest one being where the main filesystem is stored. This file is encrypted, and does not compress very well at all.

However, that file is itself a dmg (Apple disk image) file, which is a compressed file format: a dmg is a compressed HFS+ image. Therefore, the encryption is happening after the compression.

Therefore, I do not believe it is accurate to claim that this is key to the problem. While it is humorous that the files are being compressed, encrypted, and then compressed again, that is not what is causing them to fail to compress: the first compression should work.

Instead, if we go one level deeper, we can ask the question "what is Apple even storing on this filesystem", and the answer is "maybe one or two hundred megabytes of executable code, and a few hundred megabytes of graphics".

The images are stored as PNG and JPEG: file formats that are already compressed. We therefore would not expect the version in the final output file to be much smaller than that on the filesystem. These files are, in essence, being compressed, compressed, encrypted, and compressed. ;P

The executable code, meanwhile, really doesn't compress well with algorithms like deflate: while it has reasonably low entropy, its encoding looks irritatingly random to algorithms that are looking for sequences of bytes (or bits) that are actually identical, especially over small window sizes.

The problem is that you may see "add one, compare, branch if equal" all over the place, but it is "add one (to X), compare (with Y), branch if equal (to Z)", which breaks up the nice sequence. Even just reorganizing the data bits based on the instruction encoder then helps /tremendously/.

However, it is also often the case that X is one of just a few numbers, Z is one of a small range (loops aren't usually that large), etc.: however, normal algorithms look for "exactly this", not "something similar to this with an offset" (or even switching to a general integer encoder); again, minor details, but it breaks deflate.

...and, indeed, there are better compression algorithms out there already that are designed to handle code well. I swear Google even had some cool stuff for this, but I'm not finding it right now :(. Regardless, a quick (silly) citation for validity:

"""While we have not addressed the compression of machine code, others have shown that it is possible to compress machine code by a factor of 3 using a specially tuned version of a conventional compressor [Yu96] and by as much as a factor of 5 using a compressor that understands the instruction set [EEF +97]."""

-- http://www.usenix.org/event/usenix99/full_papers/wilson/wils...

So, yeah: I think the key problem is that Apple is not wasting disk space on the device. And, when you put it that way, it is obvious: why would Apple waste 700MB of flash on a 32GB device, space the user would probably really love to be storing music in, when they only have 100MB of entropy?

The answer is: "they wouldn't", and so (modulo the further compressibility of binaries, an interesting and partially open academic problem) the result is that most of the data on the filesystem is already compressed images/audio, and therefore compressing, encrypting, and even compressing again, doesn't matter to the result.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: