Hacker News new | past | comments | ask | show | jobs | submit login

Big numbers are fun. Here's some stupid stuff I thought of while reading about this.

The number is 2^74,207,281 - 1. Naively stored like a normal integer in memory, that's roughly 75 MB of RAM just to store the number. Compare that to your 64-bit time_t and imagine how long a duration of time could be represented by such a large number. Stored on 3.5" floppy disks for some reason, that's roughly 50 disks just to store the number.

Edit: Duh, bits, not bytes. I leave my mistake here for posterity.




Not unless you gzip it. Since the number is just the binary 1 repeated a bunch of times, it should gzip pretty well.


gzipped it's a 9043 byte file when the source is the number expressed exactly as bits. gzipping that 9043 byte file yields a 129 byte file, which gzip can't compress any further. Base64 of the file:

  H4sICBPHnlYCA2EuYmluegDt0LENQQEUQNHnK7xINF8rLIAVbKH88VU0ZlATK4gt1DQ6vURMoBID
  iN4CknOa29/BNrN72U+LZj2eL1fxPDU6EXE+3CbvUTEMAAAAAAAAAODfPfq9/DZfx6petw2B32Z5
  XZT31mYXH9vDv0RTIwAA


Neato! Verified with:

  echo "H4sICBPHnlYCA2EuYmluegDt0LENQQEUQNHnK7xINF8rLIAVbKH88VU0ZlATK4gt1DQ6vURMoBIDiN4CknOa29/BNrN72U+LZj2eL1fxPDU6EXE+3CbvUTEMAAAAAAAAAODfPfq9/DZfx6petw2B32Z5XZT31mYXH9vDv0RTIwAA" | base64 --decode | gzip -d | gzip -d | xxd


I see some redundancy there. The letter "A" appears many times in a row.

Also it could be compressed even further. All you need is a computer program that prints "1", 74 million times.


It's already been concisely expressed as 2^74,207,281 - 1, so you don't need a computer program that prints "1" 74 million times.

You just need a computer language that can interpret bignum math equations.


That would take ages to compute. Printing one is faster. And might produce the smallest file size if it's done in assembly.


but the metric was the most concise way to store the data


So I tried it and it's 72135 bytes using tar czf (created by `fs.writeFileSync('/tmp/file',new Buffer(74207281).fill("1"),'utf8')`)

Not that it's the most efficient representation at all, 74207281 fits in 4 bytes.


This thread is officially dumb. One guy miscalculates 75 Mbit as ~75 MiB, another guy fills ~70 MiB file with ASCII "1"s.


Strange but very useful things could emerge from just mucking about and poking around...


Hehe, I didn't mean to sound negative! Agree with the other poster that it's quite fun.

Oh, and I also forgot to mention that the guy used `tar` for compressing one file. I'll see myself out.


I'm glad I started this stupid thread, and this is my favorite comment in it.


dumb, but fun


M49 (for the 49th mersenne prime) fits in 2 bytes! Or even better just 49! 6 bits! Assuming your decompression algorithm is optimized for mersenne primes. :)


It can fit in a single bit if your algorithm is optimized for "primes that are equal to M49" :)


You could optimize the single bit away by assuming that there would be a bit.



Great, I've just discovered the 50th mersenne prime, M50. Decompression takes a lot of time.


Repeated exactly 74,207,281 times, to be precise...


No, the number 5 is prime, yet in binary it is 101. The new prime number could have many zeroes interspersed with ones, so you don't know how many 1 digits it has.

EDIT: I have realized my mistake in the representation of 2's complement. However, I will leave this here for posterity.


It's a Mersenne Prime, so it is 1 less than some power of 2. This results in all of its non-leading digits in binary being 1s.


But 5 is not 2^n-1


5 is a prime number true, but not a Mersenne prime number. The Mersenne prime number 31 has 5 as its exponent n [1].

[1] http://mathworld.wolfram.com/MersennePrime.html


Turns out run-length encoding it is more or less what you get if you store it as 2^74,207,281 - 1. I imagine this probably is how they store it actually; note that 2^n == 1<<n so you can just store the exponent and complement (74 207 281, 1) and then do normal two's complement math.


If you're talking about storing it in computer memory during calculation, then no, it's written out. The fast MOD operation you list isn't used, it's an FFT using a weight factor that automatically does the mod operation while squaring a number. You start with S[0] = 4, S[i+1] = Si^2 - 2 MOD N. If S[N-2] = 0 then the number is prime.

So it's 74,207,279 squares using FFT, each likely more than 75 MiB since there will likely be lots of leading zeros (we're probably close to the maximum number of digits using double precision, round off errors dominate unless you start using really small bases and even greater-length FFTs).

That's not counting the two or three compositeness tests that GIMPS uses before starting the Lucas-Lehmer test to prove primality.

For several years I've been working on a program to test for primality of the Generalized Fermat numbers, using NTT instead of FFT. NTT is exact (integer math), and the GFNs likely are more "dense" with primes than the Mersenne Numbers. And they also have a faster form to test, although the primality test has a 50% chance of false negative, so you have to run it a few times to be sure a number isn't prime. That's where Mersenne Numbers are easier to test.


It's the sort of thing that's so repetitive you end up gzipping the gzip, because the "repeat N times" code doesn't support going so high.


75 Megabits; 9.5 MB/7 3.5" floppies/5-10 selfies.


You're right! Thanks! I told you it was stupid :P


How many t-shirts?


Someplace used to print Mersenne primes as posters. IIRC they were using a font around 2 points. Probably given up now...


The person died, and the art with him :( http://www.mersenneforum.org/showthread.php?t=19100


I remember those. They either came with a loupe, or mentioned explicitly to the buyer that you needed one to see individual numbers. Always wanted to get one. It looked like an all grey poster if you didn't know what it was.


75 MB compresess neatly to under 12 bytes, which I say because the string "2^74207281-1" is totally unambiguous (Wolfram Alpha will evaluate it for you no questions asked, no guesswork, though obviously it only approximates the output). So the Kolmogorov complexity of the 75 MB is 96 bits or less.


Looks like you just communicated it in 16 bytes though.


It is far too many bits to be used for time representation. 512 bits is all that needed to store every planck time (10^-44 seconds) since the big bang until the heat death of the universe (10^100 years).


Sure, that's fine if all you want is to measure the time from the big bang until the last matter in the universe has evaporated into photon soup. But what if you want the time until those photons are evenly distributed? An article I read a while back estimated that duration as 10^10^78 years, which is absolutely mindboggling and remains the largest number I have ever seen with a real-world physical application.

(It may never happen, though; according to Wikipedia, random quantum fluctuations may be sufficient to produce another Big Bang in a mere 10^10^56 years.)


That makes sense for durations, but a time_t is actually a timestamp. A proper time can have the same units as a duration, but is only useful in conjunction with information about the trajectory of the clock measuring the time. If space is continuous, the amount of information to specify the trajectory is limited only by the capabilities of the measurement apparatus. Of course, for all we know at this point spacetime is quantized; so we should probably hold off on standardizing the time format to end all time formats.


Or you create a new compression scheme just for Mersenne prime numbers where store the exponent. You'll be able to compress the entire number into just 32 bits for several decades at least!


Since the exponent is prime you should be able to compress it further than that, by simply noting n is the kth prime number. Supposedly as n goes to infinity the probability that n is prime drops to zero, so that should compress handily.

Decompressing it, on the other hand, would be pretty slow however. Are primality tests O(n) or O(nlogn)? That puts decompression up to O(n^2) without a pre computed lookup table.


https://en.wikipedia.org/wiki/Prime_number_theorem#Approxima...

So with the kth prime ~= k log k, the ratio of primes is 1 / log k. Yes that drops to zero, but very slowly.

You can use the upper bound of the kth prime to build a sufficiently large sieve, which can be done O(n).


If you don't care about computation time, then why stop there?

You can encode the Mth Mersenne with the integer M.


But then I have to keep patching the code every time a new one comes out. Too hard!


You don't. You can write a program that converts between m and M_m. It just takes a long time to convert.


I think there's a related mathematical paradox about the largest number that can be represented by an English phrase of fewer than 93 characters.


That's 75 megabits, not megabytes.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: