I had an idea awhile back about treating large files as insanely large hex integ...

gpcz · on Nov 23, 2013

Let's try your algorithm with the number 25 (11001 in binary).

One formulation is 2(11)+3. The primes up to 25 are 2,3,5,7,11,13,17,19,23, so we'll use the index 4 to represent the prime number 11.

To lay this out, we need to represent three numbers: 2, 4, and 3. Without the inevitable markings you'd need to delineate when numbers started and stopped, the bit string would look like 10 100 11.

Using 11 as the prime, your compression would require laying out 1010011 to represent 11001, which means it's actually anti-compression.

DoggettCK · on Nov 23, 2013

Like I said, probably full of shit, but I think of odd stuff in the shower.

With really, really huge (like the aforementioned 7GB) numbers, you might see something better, sort of how there's a point with most compression methods that the header is bigger than the data being compressed.

Kinda doubting it now, though.

gpcz · on Nov 23, 2013

I wrote a little Python program that lets you try it with an arbitrary integer. It assumes that for instances where you have to lay out a zero that you need one bit (for 0).

This times out quickly if you use too big of a number, but you can play around with integers here: http://codepad.org/4mnHI9E3

Based on what I see, it compresses poorly at the beginning and then reaches parity. At 1,000,000 (tested on my machine, not Codepad since it times out), you get one bit of profit, but 1,000,005 is at parity, suggesting that it's not very stable.

Also, this is assuming that the decompressor can make sense of the absolute smallest representation you could lay out these numbers with, which is basically not going to happen. That additional padding would destroy any potential savings you got.

DoggettCK · on Nov 23, 2013

I think what made me think of it in the first place was the representation of the largest known primes (http://primes.utm.edu/largest.html#biggest)

The current largest prime is (2^57885161)-1, which is 17,425,170 digits, yet can be represented with much less, obviously.

Maybe adding exponents would help. aX^n + b, perhaps?

kazagistar · on Nov 23, 2013

(this is the point where I hope hope HOPE everyone in this thread is joking)

DoggettCK · on Nov 23, 2013

Well, I did represent that record prime number, which would take 7,235,646 bytes to write to disk, in 14 bytes, so it's either joking or sorcery.

davvid · on Nov 23, 2013

25 is a small number. Would it make a difference with (much) larger ones?