Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I had an idea awhile back about treating large files as insanely large hex integers, then dividing that by a huge known prime, and basically storing 3 smaller integers that plug into the equation "ax + b", where x is the known prime, a is what you found when doing the division, and b is the remainder. Plugging those numbers into the equation would give you the huge integer representing the original file.

I'm assuming the known primes would be in a database somewhere, and you could just use the indices, instead of some 512-digit number.

The process could be repeatable if 'a' was too big, theoretically enabling some extreme compression.

You'd need a way to handle extremely large ints, and in the few minutes I thought about it, you'd need at least twice the original file size available for compression/decompression.

I'm also pretty sure my idea is full of shit, but there are more details than in the article.



Let's try your algorithm with the number 25 (11001 in binary).

One formulation is 2(11)+3. The primes up to 25 are 2,3,5,7,11,13,17,19,23, so we'll use the index 4 to represent the prime number 11.

To lay this out, we need to represent three numbers: 2, 4, and 3. Without the inevitable markings you'd need to delineate when numbers started and stopped, the bit string would look like 10 100 11.

Using 11 as the prime, your compression would require laying out 1010011 to represent 11001, which means it's actually anti-compression.


Like I said, probably full of shit, but I think of odd stuff in the shower.

With really, really huge (like the aforementioned 7GB) numbers, you might see something better, sort of how there's a point with most compression methods that the header is bigger than the data being compressed.

Kinda doubting it now, though.


I wrote a little Python program that lets you try it with an arbitrary integer. It assumes that for instances where you have to lay out a zero that you need one bit (for 0).

This times out quickly if you use too big of a number, but you can play around with integers here: http://codepad.org/4mnHI9E3

Based on what I see, it compresses poorly at the beginning and then reaches parity. At 1,000,000 (tested on my machine, not Codepad since it times out), you get one bit of profit, but 1,000,005 is at parity, suggesting that it's not very stable.

Also, this is assuming that the decompressor can make sense of the absolute smallest representation you could lay out these numbers with, which is basically not going to happen. That additional padding would destroy any potential savings you got.


I think what made me think of it in the first place was the representation of the largest known primes (http://primes.utm.edu/largest.html#biggest)

The current largest prime is (2^57885161)-1, which is 17,425,170 digits, yet can be represented with much less, obviously.

Maybe adding exponents would help. aX^n + b, perhaps?


(this is the point where I hope hope HOPE everyone in this thread is joking)


Well, I did represent that record prime number, which would take 7,235,646 bytes to write to disk, in 14 bytes, so it's either joking or sorcery.


25 is a small number. Would it make a difference with (much) larger ones?




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: