Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wonder why the author ignored the option of compression in the post. Even with a simple gzip DEFLATE compression, those 10MB of plain text could get as small as a 1MB archive and possibly more, meaning that in a compressed 10MB payload you could fit much much more than 135K records.


It isn't 10MB of plain text though, it's 10MB of binary SQLite database. I agree that compression would be useful here, but I don't think a simple gzip DEFLATE would be.

I was curious so I compressed that torrent db with a few different methods:

  11.1MB 11116544B dump.sqlite
  10.2MB 10155419B dump.csv
   6.6MB  6573399B dump.sqlite.gz
   6.6MB  6565771B dump.zip
   5.6MB  5616842B dump.rar
gzip is certainly suitable to be used in this situation, I stand corrected.


Back of envelope:

The 10MB estimated size came from [100 bytes per row] * [100k rows].

50 of the bytes per row were "description", which should compress well (2-3x, I'd guess).

40 bytes per row were the IPFS ID/hash, IIUC. I assumed this is like a Git hash, 40 hex chars, which is really just 20 bytes of entropy.

He also estimated 14 bytes for the size (stored as a string representation of a decimal integer, up to 1e15 - 1, or 1PB?). That's about 50 bits or 6-7 bytes, as a binary integer. Sizes wouldn't be uniformly distributed though so it would compress to even fewer bytes.

So if SQLite was smart (or one gzips the whole db file, like you did), it makes sense that a factor of 2 or so is reclaimable.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: