Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Reminds me of http://base91.sourceforge.net/.

We could go further, straight to Base8000!



Already exists: https://github.com/qntm/base65536

It's actually pretty useful for compressing data in Unicode-aware environments, like Twitter. Which makes me wonder if Unicode support is universal enough now that an encoding like this could replace MIME/base64 in email.


Okay, I have seen this 10 times or so when I tried to compare various binary-to-text encodings and basE91 is the only one without a format description. Probably it's time to directly look at the source code. Amazingly, this one turns out to be the only binary-to-text encoding with the input bits groupped by varying number of bits I have ever seen. More specifically:

* The input bits are packed in the reverse order (e.g. 1A 2B 3C is packed as 0x3C2B1A) unlike most other binary-to-text encodings. The last bits are padded with preceding zeroes.

* A pair of basE91 alphabets encode a number 0 through 8280. The first alphabet is least significant: `AB` encodes 91 and not 1.

* 91^2 = 8281 > 2^13 = 8192, so groups of 13 bits are read and encoded as two basE91 alphabets from the least significant to the most significant. But it's not always the case. Occasionally a group of lowermost 14 bits will be read if the bits are less than 91^2. As a result, the first 8281 - 8192 = 89 values (0..88) and the last 89 values (8192..8280) actually encode 14 bits, and it includes all-zero bits. Its average overhead is therefore 22.93% (16 / lg 8281 - 1) and can reach 14.29% (16 / 14 - 1) when all bits are zero.

It reminds me of Ascii85 [1] which had a shorthand for all-zero groups and all-space groups, but this one is more general. Speaking of generality, probably a binary-to-text encoding with arithmetic coding is now viable?

[1] https://en.wikipedia.org/wiki/Ascii85#btoa_version




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: