Introducing Brotli: a new compression algorithm for the internet

tmd83 · on Sept 22, 2015

It's fantastic to see so much fascinating work in compression these days after a fair amount of stagnation. I know there had been a lot of specialized but codec work but the recent discoveries are very much for wide use.

You now have LZ4, Brotil, zstd, snappy, lzfse, lzma all pretty useful practical codec.

Brotil is interesting though. It can be an easy replacement for zlib at level 1 with fairly higher compression then zlib at similar speed.

On the other hand with with lzma it can handily beat lzma but with an even slower compression rate (from the levels that they published in the benchmark) but on the other hand with much higher decompression speed meaning its very good for distribution work. It would be interesting to see the compression ratio, time for levels between 1-9.

It's actually a much easier replacement for zlib then lzma for some. The benchmark shows only level 1,9 and 11. It seems that it can handily beat lzma but at the cost of compression speed (I wonder who use more memory). Then again its decompression speed is so much better making it a perfect choice for distribution.

What truly surprise me though is the work of 'Yann Collet'. A single person so handily beating google's snappy. Even his zstd work looks ground breaking. When I read couple of weeks ago that he was a part time hobby programmer I just didn't know how to be suitably impressed.

Am I reading right that apples lzfse is based of a mix of lz+zstd?

huhtenberg · on Sept 22, 2015

Research in compression hardly ever stalled, it just wasn't getting much publicity because it wasn't coming from Google. For example, looking at Brotli it appears to be based on ideas very similar to those explored by the PAQ project [1] not several years ago.

[1] https://en.wikipedia.org/wiki/PAQ

acdha · on Sept 22, 2015

The web has definitely had a problem with legacy restrictions driving thinking. We lost nearly a decade with Internet Explorer stagnation and I think that caused a lot of people to assume that the odds of something new going mainstream were too low.

Now we're in a very different place where people upgrade software much faster than in the past and that's likely to start showing up in projects like this where it's actually plausible to think of something like Brotli seeing widespread usage within 1-2 years.

aidenn0 · on Sept 22, 2015

On top of that Google can unilaterally upgrade the servers that serve the most popular site on the internet and ~40% of all browsers at the same time.

tmd83 · on Sept 22, 2015

Ohh I know there has been a lot of work, exotic codecs for a long while. I'm not an expert but I have looked up whats out there fairly often. What I'm saying that most of them had been too slow or specialised to be used by most people. But in recent times those exotic codecs has been transformed, tuned to make a lot of practical codecs.

sp332 · on Sept 22, 2015

7zip's implementation of zip has been faster and more effective than zlib for a long time now.

legulere · on Sept 22, 2015

I though lzfse is deflate with finite state entropy instead of huffman:

http://fastcompression.blogspot.de/2013/12/finite-state-entr...

eln1 · on Sept 24, 2015

FSE is implementation of this new ANS entropy coding, also used e.g. in lzturbo 1.2. It's surprising that even still using huffman, brotli is slowly approaching its performance:

http://encode.ru/threads/2313-Brotli?p=44970&viewfull=1#post...

rottyguy · on Sept 22, 2015

I'm curious about the viability of the following approach. Can a generic static dictionary be made to be held on the client side such that said compressor/decompressor can utilize efficiently? This would prevent the need to send along the dictionary with the compressed package every time. Even at an 80/20 - 90/10 success rate (of course we'd have a fall back to worse case), wouldn't this be a great advancement and reduce massive network load? With modern hd sizes, many could spare a few gigabites which could be dialed up/down as necessary depending on how optimized you'd want it. I would think we could identify particular binary patterns based on the users xfer habits (eg downloading text vs application vs audio vs video) and have different dictionaries optimized to their usage (eg roku would just store dictionaries maximized for video usage)

bryanlarsen · on Sept 22, 2015

Yup, that's called SDCH (shared dictionary compression for HTTP, colloquially "sandwich"), which was first proposed in 2009 by Google. Brotli supports SDCH, but it's an additional component.

1: https://engineering.linkedin.com/shared-dictionary-compressi...

jefftk · on Sept 22, 2015

That sounds a lot like sdch, which you can actually use today if you're serving to in Chrome or Opera. (Look for "Accept-Encoding: sdch".)

twotwotwo · on Sept 22, 2015

Brotli has a static dictionary, an appendix of the draft: https://tools.ietf.org/html/draft-alakuijala-brotli-01. It comes up a few times in the comments here: https://news.ycombinator.com/item?id=7894299 . It has words and phrases in various languages, bits of HTML and JavaScript, and miscellaneous wacky stuff--looks like the kind of thing you could generate from a Web crawl, say.

mrits · on Sept 23, 2015

Is there anything out that let you have "pluggable" dictionaries? I have a client-server communication that often sends 90% of the same words. I'd like to build to build this dictionary on the server and ship it down to the clients.

I prototyped something myself and SHDC for the browser was the closet I could find.

palsecam · on Sept 22, 2015

Additional pointers:

– a more detailed post about Brotli: http://textslashplain.com/2015/09/10/brotli/

– a MSDN article on “compressing the Web”, explaining the difference between deflate and gzip, the limitations of browsers, and mentionning zopfli and brotli: http://blogs.msdn.com/b/ieinternals/archive/2014/10/21/http-...

japanuspus · on Sept 22, 2015

This one goes to 11!

Seriously, the Brotli quality setting goes to eleven -- pure genius.

http://www.gstatic.com/b/brotlidocs/brotli-2015-09-22.pdf (Fig. 1)

robeastham · on Sept 22, 2015

For those not in the know:

https://www.youtube.com/watch?v=4xgx4k83zzc :-)

jakejake · on Sept 22, 2015

Since the recent innovation of middle-out compression there's been a lot of activity in this space.

usefulcat · on Sept 22, 2015

pigz (parallel gzip) also has a -11 option

opk · on Sept 22, 2015

The pigz -11 option uses zopfli which is the algorithm referenced in the original article.

simonlindholm · on Sept 22, 2015

Currently being implemented in Firefox: https://bugzilla.mozilla.org/show_bug.cgi?id=366559

gjm11 · on Sept 22, 2015

Note that this bug originally seems to have been called something like "Implement LZMA compression", and the early comments are about that; it's some way in that someone says "oh, and Google have this new Brotli thing which might be even better".

puzzlingcaptcha · on Sept 22, 2015

Mozilla can be quite slow with certain features; for example work on U2F support hasn't even begun one year after a ticket was made (https://bugzilla.mozilla.org/show_bug.cgi?id=1065729). I'm not exactly holding my breath.

amaranth · on Sept 22, 2015

They've already tried to land the code to add brotli support but had to back it out because of an Android build failure. They're moving pretty quick on this. It helps that they already had brotli for WOFF so there is less review required.

bryanlarsen · on Sept 22, 2015

Note that this will only be supported over https, due to broken proxies:

https://bugzilla.mozilla.org/show_bug.cgi?id=366559#c92

ddddddddq · on Sept 22, 2015

I would be in favor of any new technology for the web requiring TLS. It provides both a carrot (of new features) and a stick (of falling behind competitors) for people to get off their asses and secure the web from a whole host of attacks.

Even if your page doesn't have sensitive information on it, an insecurely loaded page provides an attacker the avenue to inject potentially malicious code. This will be the case until the entire web is HTTPS-enabled.

harsh1618 · on Sept 22, 2015

Aren't there any security risks when using this over HTTPS, considering past attacks like BREACH and CRIME?

jefftk · on Sept 22, 2015

CRIME: TLS compression can reveal private headers, like auth cookies. Fixed by turning off TLS compression. Not applicable to HTTP because HTTP never had header compression.

BREACH: Response body compression of a page where there's (a) something attacker controlled, (b) something private and unchanging in the body can reveal that secret, and (c) response length is visible to an attacker. Doesn't require HTTPS.

If an attack applied, it would be one like BREACH. Which isn't surprising: this is a direct replacement for "Accept-Encoding: gzip / Content-Encoding: gzip" and so we should expect it to be in the same security situation.

nly · on Sept 22, 2015

I think as long as you're not compressing secrets and attacker controlled data together you're fine.

smcleod · on Sept 22, 2015

Good incentive for people to use https IMO.

tveita · on Sept 22, 2015

> Unlike other algorithms compared here, brotli includes a static dictionary. It contains 13’504 words or syllables of English, Spanish, Chinese, Hindi, Russian and Arabic, as well as common phrases used in machine readable languages, particularly HTML and JavaScript.

They could at least have measured the other algorithms with the same dictionary for fairness.

fsiefken · on Sept 22, 2015

"We hope that this format will be supported by major browsers" - if I'm not mistaken it should also be implemented by all major webservers. In that regard I'm hoping Opera open sources or sells their OMPD proxy platform. Someone made a chrome extension which interfaces with their network and the speed is amazing. OMPD is a tradeoff and breaks certain javascript and css and is different from Opera Turbo, I am not sure what compression Turbo uses currently. Does anyone know if Brotli gets implemented in the google server and proxy compression? http://browsingthenet.blogspot.nl/2014/09/chrome-data-compre...

gsnedders · on Sept 22, 2015

> I'm hoping Opera open sources or sells their OMPD proxy platform.

They won't open source it: the compression tech has for a number of years accounted for a large proportion of Opera's (browser) income. I also don't actually know what OMPD is. Opera Mini server? That's tied sufficiently closely to Presto that it'll only ever get released if Presto does.

The more recent Opera Turbo and the Opera Mini 11 for Android "high-compression mode" (though I have no idea how that really differs to Opera Turbo! [edit: per @brucel it is Opera Turbo]) are certainly available for licensing; Yandex Browser supports Opera Turbo, for example.

estefan · on Sept 22, 2015

> Just like Zopfli, the new algorithm is named after Swiss bakery products. Brötli means ‘small bread’ in Swiss German.

Makes sense.

TazeTSchnitzel · on Sept 22, 2015

Disappointing they aren't dissolving the umlaut and instead stripping it. 'Broetil' would be equivalent to 'Brötil', yet they remove it (and thus change the sound) to create 'Brotil'.

agumonkey · on Sept 22, 2015

Keeping that long human linguistic tradition of information loss. Pretty on point for a compression scheme.

scrollaway · on Sept 22, 2015

Brilliant. I love it. I'll pretend they did that on purpose :)

usefulcat · on Sept 22, 2015

Well, maybe not for lossless compression..

agumonkey · on Sept 22, 2015

That was the joke.

usrusr · on Sept 22, 2015

This way, very much umlaut-capable, but not at all swiss-capable Germans won't be tempted to pronounce it in a terribly wrong and insulting imitation of some swiss dialect. Serious international tensions will be avoided by clearly separating the name from the language it is derived from. If your company motto was "don't be evil", you surely would not want your compression algorithms to cause naval standoffs on Lake Constance!

Also, simply dropping those dots where they actually belong is refreshingly post-metal-umlaut.

m_mueller · on Sept 22, 2015

First I thought you made a typo, but then it was consistent: Swiss German it is Brötli.

TazeTSchnitzel · on Sept 22, 2015

Oops, yes, L I not I L. I misread. My bad!

TazeTSchnitzel · on Sept 23, 2015

Muphry's Law strikes again.

stavros · on Sept 22, 2015

They change the sound in transliterating it anyway. Would you prefer bro-tli or bro-ee-tli?

nrr · on Sept 22, 2015

Er, no, the transliteration of "ö" to "oe" does not change the sound. This is considered a feature of the orthography of German, and you see this kind of thing in, e.g., crossword puzzles.

biafra · on Sept 22, 2015

And even earlier the ö was transliterated from oe. The e was pulled on top of the o. The e was then changed to two dots. Probably because of laziness.

See: https://en.wikipedia.org/wiki/Germanic_umlaut#Orthography_an...

stavros · on Sept 22, 2015

I meant transliteration to English. No English speaker will pronounce the o with the umlaut, so you have to pick how you want English speakers to pronounce your word from the two alternatives above. Bro-li sounds much better to me than bro-ee-li.

anentropic · on Sept 22, 2015

Except... 'oe' at the end of words is rare in English, but when we have it we pronounce it like 'o'

eg: hoe sloe (both rhyme with 'bro')

to get a 'bro-ee' effect you'd need a 'oey' or some other spelling

stavros · on Sept 22, 2015

Are you saying you'd pronounce "broetli" as "brotli"? I think that puts you in a very small minority.

TillE · on Sept 22, 2015

English speakers don't pronounce "oe" as a diphthong either.

_delirium · on Sept 22, 2015

Sometimes they do, sometimes they don't. It doesn't really have any consistent English pronunciation, since it appears in a bunch of different morphological contexts, and in loanwords from at least five or six different languages. Some examples: phoenix (1 syllable, 'ee'), Zoe (2 syllables, 'oh-ee'), Joe (1 syllable, 'oh'), Joel (either 1 or 2 syllables, depending on schwa insertion), canoe (1 syllable, 'oo'), Dostoevsky (2 syllables, 'oh-eh' or 'oh-yeh'), coed (2 syllables, 'oh-eh'), etc.

If I saw something like Broetli and didn't immediately recognize the Germanic origin, I could see myself easily misanalyzing it as Bro-etli and pronouncing it that way.

cpeterso · on Sept 22, 2015

What is the pronunciation of Google's "brotli"?

curun1r · on Sept 22, 2015

It's a sensible name, considering the previous name, but there's still a part of me that's sad that they missed the opportunity to name it Nucleus.

IshKebab · on Sept 22, 2015

Surely it's more like "breadlet"?

anentropic · on Sept 22, 2015

...and everyone knows breadlet compression > wavelet :)

kccqzy · on Sept 22, 2015

Really like how big and influential companies are working on foundational technologies like compression, which smaller companies would have no chance of popularising. Earlier this summer Apple introduced lzfse. But this brotli seems even more amazing as its compression ratio seems to match that of lzma. Wonderful.

bhouston · on Sept 22, 2015

How does this do compressing trimesh binary data like this stream?

https://d3ijcvgxwtkjmf.cloudfront.net/a4c3c7313b7bdeb68ad46a...

Base: 6,779,000 bytes

GZip2 Normal: 2,296,362 bytes, Ultra: 2,258,967 bytes

LAMZ2 Normal: 921,600 bytes, Ultra: 920,147 bytes

sp332 · on Sept 22, 2015

Default and "ultra" give the same number: 1,513,459 bytes.

jacobolus · on Sept 22, 2015

I wonder how it compares to Apple’s new proprietary LZFSE codec. Here was Apple’s WWDC slide – http://forums.macrumors.com/attachments/lzfse_1-png.565004/ – but I don’t think they published full details anywhere.

DiabloD3 · on Sept 22, 2015

If Google wants me to personally adopt this outside of woff2 (woff2's major change is switch to Brotli over deflate), then they should submit a patch to nginx to add modules named brotli and brotli_static, and some command line gzip-like that does Brotli.

DiThi · on Sept 22, 2015

For brotli_static, I think a header can be easily added for serving pre-compressed content. And a dynamic version may be too slow.

Zstd is better for dynamic compression. Or libslz today for very fast compression with zlib compatibility.

sounds · on Sept 22, 2015

Brotli at compression level 1 is as fast as zstd.

dwb · on Sept 22, 2015

So it should be "Broetli", then...?

wodenokoto · on Sept 22, 2015

Yes and no. It is very common to simply strip umlauts and dashes when converting to ASCII compatible writing.

I even do it with my name most of the times. A Chinese official that sees Ø in my passport won't understand why I write 'OE', and might even start questioning if it is the same name, but 'O' never fails.

Airlines can read the gibberish in the bottom of my passport and see that the transliteration is actually 'OE', but most places don't have a scanner for that.

dot · on Sept 22, 2015

Can't wait for Gipfeli

dekhn · on Sept 22, 2015

https://github.com/google/gipfeli

scentoni · on Sept 22, 2015

Ruebli.

mkempe · on Sept 22, 2015

Muesli

tarblog · on Sept 23, 2015

In the spec [1], they published the dictionary used in hexadecimal form. Can someone explain why there are so many 6's in the data?

For comparison here's a chart of hex character and approximate [2] number of occurrences.

(0 - 15k) (1 - 10k) (2 - 19k) (3 - 14k) (4 - 15k) (5 - 16k) (6 - 62k) (7 - 32k) (8 - 10k) (9 - 11k) (a - 11k) (b - 7k) (c - 8k) (d - 12k) (e - 20k) (f - 9k)

[1]: http://www.ietf.org/id/draft-alakuijala-brotli-05.txt [2]: counted with find-in-page, didn't bother to only search the dict

comex · on Sept 23, 2015

The 6s and 7s correspond to ASCII lowercase letters. Try decoding the hex strings as UTF-8...

x0 · on Sept 22, 2015

I don't like the name much so I'm calling it broccoli.

Sanmayce_Kaze · on Sept 28, 2015

I have an actual question. Who can name all the decompressors that outperform Brotli both in compression ratio and decompression speed? As I see it, that's the ultimate goal, to boost I/O via transparent decompression.

Reading the PDF paper I see they use for benchmarking Xeon with price-tag in range of $560, how this can be reconciled with hot phrases like boosting browsing on mobile devices?!

eln1 · on Sept 28, 2015

Here are some benchmarks and discussion of data compression specialists: http://encode.ru/threads/2313-Brotli?p=44970&viewfull=1#post...

"Input: 812,392,384 bytes, HTML top 10k Alexa crawled (8,998 with HTML response)

Output: 219,591,148 bytes, 1.799 sec., 1.355 sec., tor-small -1

218,018,012 bytes, 1.951 sec., 1.464 sec., tor-small -1 -b800mb

210,647,258 bytes, 1.736 sec., 1.328 sec., qpress64 -L1T1

210,194,996 bytes, 0.333 sec., 0.371 sec., lz4 -1

194,233,793 bytes, 2.818 sec., 1.348 sec., qpress64 -L2T1

187,766,706 bytes, 7.966 sec., 1.059 sec., qpress64 -L3T1

173,904,470 bytes, 2.995 sec., 2.721 sec., tor-small -2

173,418,150 bytes, 3.132 sec., 2.843 sec., tor-small -2 -b800mb

169,476,113 bytes, 2.072 sec., 0.352 sec., lz4 -9

165,571,040 bytes, 2.931 sec., 2.820 sec., NanoZip - f

158,213,503 bytes, 1.855 sec., 0.980 sec., zstd

154,673,082 bytes, 3.213 sec., 2.445 sec., tor-small -3

154,166,902 bytes, 3.364 sec., 2.555 sec., tor-small -3 -b800mb

152,477,067 bytes, 2.128 sec., 0.973 sec., lzturbo -30 -p1

152,477,067 bytes, 2.132 sec., 0.971 sec., lzturbo -30 -p1 -b4

150,773,269 bytes, 2.151 sec., 1.047 sec., lzturbo -30 -p1 -b16

150,219,553 bytes, 2.332 sec., 1.204 sec., lzturbo -30 -p1 -b800

149,670,044 bytes, 7.825 sec., 2.567 sec., WinRAR - 1

149,642,742 bytes, 2.069 sec., 0.646 sec., zhuff_beta -c0 -t1 145,770,266 bytes, 4.586 sec., 2.285 sec., tor-small -4

141,951,602 bytes, 4.484 sec., 2.360 sec., tor-small -4 -b800mb

141,215,050 bytes, 2,751 sec., 0.938 sec., lzturbo -31 -p1 -b4

140,657,806 bytes, 5.037 sec., 2.544 sec., FreeArc - 1

140,211,060 bytes, 6,970 sec., 1.775 sec., bro -q 1

138,103,483 bytes, 118.023 sec., 2.394 sec., cabarc -m LZX:15

138,051,401 bytes, 2.761 sec., 1.001 sec., lzturbo -31 -p1 -b16

137,564,310 bytes, 18.762 sec., 4.299 sec., NanoZip - dp

137,211,547 bytes, 7.808 sec., 1.712 sec., bro -q 2

137,000,208 bytes, 3.763 sec., 3.852 sec., NanoZip - F

136,523,335 bytes, 2.830 sec., 1.094 sec., lzturbo -31 -p1

136,445,854 bytes, 50.932 sec., 2.344 sec., lzhamtest_x64 -m0 -d24 -t0 -b

136,337,495 bytes, 14.823 sec., 4.259 sec., NanoZip - d

135,723,691 bytes, 8.318 sec., 1.677 sec., bro -q 3

135,656,476 bytes, 2.972 sec., 1.153 sec., lzturbo -31 -p1 -b800

135,315,388 bytes, 51.436 sec., 2.371 sec., lzhamtest_x64 -m0 -t0 -b

135,287,357 bytes, 51.937 sec., 2.418 sec., lzhamtest_x64 -m0 -d29 -t0 -b

135,071,576 bytes, 21.650 sec., 4.251 sec., NanoZip - dP

132,819,515 bytes, 20.102 sec., 6.278 sec., 7-Zip - 1

131,871,664 bytes, 14.052 sec., 0.899 sec., lzturbo -32 -p1 -b4

131,401,865 bytes, 9,917 sec., 1.677 sec., bro -q 4

129,184,341 bytes, 8.305 sec., 2.692 sec., tor-small -5

127,355,215 bytes, 20.866 sec., 5.825 sec., 7-Zip - 2

127,045,472 bytes, 9.549 sec., 0.957 sec., lzturbo -32 -p1 -b16

126,139,033 bytes, 8.025 sec., 2.751 sec., tor-small -5 -b800mb

125,732,647 bytes, 10.642 sec., 2.618 sec., tor-small -6

125,454,769 bytes, 140.513 sec., 2.281 sec., cabarc -m LZX:18

123,169,077 bytes, 8.472 sec., 1.090 sec., lzturbo -32 -p1

123,093,411 bytes, 22.468 sec., 5.508 sec., 7-Zip - 3

122,564,329 bytes, 10.074 sec., 2.680 sec., tor-small -6 -b800mb

122,480,456 bytes, 19.411 sec., 1.645 sec., bro -q 5

121,068,548 bytes, 14.536 sec., 3.289 sec., FreeArc - 2

120,653,755 bytes, 16.107 sec., 2.552 sec., tor-small -7

119,969,489 bytes, 27.663 sec., 1.602 sec., bro -q 6

119,740,393 bytes, 27.370 sec., 5.259 sec., 7-Zip - 4

118,343,545 bytes, 24.112 sec., 2.123 sec., WinRAR - 2

118,139,032 bytes, 35.361 sec., 4.371 sec., NanoZip - Dp

117,500,327 bytes, 15.517 sec., 2.594 sec., tor-small -7 -b800mb

117,388,039 bytes, 23.546 sec., 2.524 sec., tor-small -8

116,526,595 bytes, 37.847 sec., 4.383 sec., NanoZip - DP

116,454,906 bytes, 35.232 sec., 4.351 sec., NanoZip - D

116,269,246 bytes, 25.888 sec., 6.589 sec., FreeArc - 3

116,217,001 bytes, 40.748 sec., 1.630 sec., bro -q 7

115,993,125 bytes, 192.929 sec., 2.199 sec., cabarc -m LZX:21

115,985,847 bytes, 386.192 sec., 0.850 sec., lzturbo -39 -p1 -b4

115,729,606 bytes, 35.504 sec., 2.095 sec., WinRAR - 3

115,163,486 bytes, 55.523 sec., 1.614 sec., bro -q 8

115,022,074 bytes, 49.863 sec., 2.084 sec., WinRAR - 5

114,602,026 bytes, 8.403 sec., 1.218 sec., lzturbo -32 -p1 -b800

114,345,025 bytes, 78,418 sec., 1.594 sec., bro -q 9

114,281,170 bytes, 22.925 sec., 2.575 sec., tor-small -8 -b800mb

113,354,128 bytes, 29.519 sec., 2.474 sec., tor-small -9

112,376,531 bytes, 177.077 sec., 1.923 sec., lzhamtest_x64 -m1 -d24 -t0 -b

111,848,802 bytes, 29.046 sec., 2.515 sec., tor-small -9 -b800mb

110,532,234 bytes, 40.580 sec., 2.496 sec., tor-small -10

110,177,215 bytes, 54.632 sec., 6.398 sec., FreeArc - 4

109,908,468 bytes, 40.292 sec., 2.501 sec., tor-small -10 -b800mb

109,522,695 bytes, 208.748 sec., 1.898 sec., lzhamtest_x64 -m2 -d24 -t0 -b

109,425,530 bytes, 436.824 sec., 0.893 sec., lzturbo -39 -p1 -b16

108,520,934 bytes, 58.227 sec., 2.521 sec., tor-small -11

108,520,934 bytes, 58.329 sec., 2.518 sec., tor-small -11 -b800mb

107,850,398 bytes, 266.166 sec., 2.562 sec., tor-small -12

107,842,909 bytes, 267.559 sec., 2.550 sec., tor-small -12 -b800mb

106,128,420 bytes, 271.607 sec., 1.850 sec., lzhamtest_x64 -m3 -d24 -t0 -b

105,933,030 bytes, 571,280 sec., 5.168 sec., lzturbo -49 -p1 -b4

105,692,791 bytes, 193.962 sec., 1.919 sec., lzhamtest_x64 -m1 -t0 -b

104,539,771 bytes, 316.307 sec., 1.833 sec., lzhamtest_x64 -m4 -d24 -t0 -b

104,094,380 bytes, 2313.780 sec., 1.693 sec., bro -q 10

104,053,219 bytes, 195.503 sec., 1.977 sec., lzhamtest_x64 -m1 -d29 -t0 -b

102,895,078 bytes, 148.997 sec., 4.850 sec., 7-Zip - 5

101,941,653 bytes, 237.364 sec., 1.889 sec., lzhamtest_x64 -m2 -t0 -b

100,898,120 bytes, 534.627 sec., 1.097 sec., lzturbo -39 -p1

100,159,922 bytes, 239.813 sec., 1.933 sec., lzhamtest_x64 -m2 -d29 -t0 -b

99,699,129 bytes, 625.001 sec., 4.902 sec., lzturbo -49 -p1 -b16

96,239,572 bytes, 347.011 sec., 1.893 sec., lzhamtest_x64 -m3 -t0 -b

95,197,295 bytes, 236.139 sec., 4.587 sec., 7-Zip - 9

94,133,011 bytes, 356.440 sec., 1.933 sec., lzhamtest_x64 -m3 -d29 -t0 -b

93,601,386 bytes, 431.884 sec., 1.899 sec., lzhamtest_x64 -m4 -t0 -b

92,303,359 bytes, 727.475 sec., 4.729 sec., lzturbo -49 -p1

91,310,894 bytes, 449.102 sec., 1.923 sec., lzhamtest_x64 -m4 -d29 -t0 -b

90,239,627 bytes, 680.976 sec., 1.170 sec., lzturbo -39 -p1 -b800

87,715,022 bytes, 314.169 sec., 5.428 sec., FreeArc - 9

82,891,405 bytes, 882.513 sec., 4.597 sec., lzturbo -49 -p1 -b800

77,286,010 bytes, 6497.059 sec., 7.715 sec., glza

Used: 7z 15.07 beta - Sep 17, 2015 (one thread)

rar 5.40 beta 4 - Sep 21, 2015 (one thread)

arc 0.67 - Mar 15, 2014 (one thread)

nz 0.09 - Nov 4, 2011 (one thread)

zhuff_beta 0.99 - Aug 11, 2014

cabarc 6.2.9200.16521 - Feb 23, 2013

lz4 1.4 - Sep 17, 2013

qpress64 1.1 - Sep 23, 2010

zstd 0.0.1 - Jan 25, 2015

tor-small 0.4a - Jun 2, 2008

lzturbo 1.2 - Aug 11, 2014

lzhamtest_x64 1.x dev - Sept 25, 2015 (own VS2015 compile)

glza 0.3a - Jul 15, 2015 "

lzturbo seems essentially better, a few other happen to be better.

powturbo · on Oct 6, 2015

Internet Scenario Benchmark (browser plugin simulation):

html8 : 100MB random html pages from a 2GB Alexa Top sites corpus. number of pages = 1178 average length = 84886 bytes.

The pages (length + content) are concatenated into a single html8 file, but compressed/decompressed separately. This avoids the cache scenario like in other benchmarks, where small files are processed repeatedly in the L1/L2 cache, showing unrealistic results.

size: 100,000,000 bytes. Single thread in memory benchmark cpu: Sandy bridge i7-2600k at 4.2 Ghz, all with gcc 5.1, ubuntu 15.04

      size  ratio%   C MB/s       D MB/s     MB=1.000.000
  15180334    15.2     0.43       482.07    brotli 11 v0.2.0
  15309122    15.3     2.27       127.23    lzma 9  v15.08
  16541706    16.5     2.07      1463.39    lzturbo 39  v1.3
  16921859    16.9     2.96       230.54    lzham 4  v1.0
  17153795    17.2     0.13       474.63    zopfli  v15-05
  17860382    17.9    43.51       495.78    zlib 9  v1.2.8
  18033576    18.0   135.62      1454.31    lzturbo 32  v1.3
 100000000   100.0  5984.00      6043.00    libc memcpy

LzTurbo compress 5 times and decompress 3 times faster than brotli.

LzTurbo decompress more than 6 times faster than lzham

Sanmayce_Kaze · on Sept 28, 2015

Thanks, yes, very good roster, I am aware of these performers, maybe out there some hidden excellent tight/fast ones exist?!

In my view the two best are:

169,476,113 bytes, 2.072 sec., 0.352 sec., lz4 -9

115,985,847 bytes, 386.192 sec., 0.850 sec., lzturbo -39 -p1 -b4

LzTurbo benefits a lot from bigger blocks, in above example it uses 4MB block single-threadedly, and look how fast it is, if 16 threads are to be used what the outcome would be ... maybe 0.100 sec?!

eln1 · on Sept 28, 2015

Let's look at the most interesting there:

fastest encoding:

210,194,996 bytes, 0.333 sec., 0.371 sec., lz4 -1

fastest decoding:

169,476,113 bytes, 2.072 sec., 0.352 sec., lz4 -9

149,642,742 bytes, 2.069 sec., 0.646 sec., zhuff_beta -c0 -t1

gzip-alternative:

135,723,691 bytes, 8.318 sec., 1.677 sec., bro -q 3

114,602,026 bytes, 8.403 sec., 1.218 sec., lzturbo -32 -p1 -b800

prepacked:

104,094,380 bytes, 2313.780 sec., 1.693 sec., bro -q 10

90,239,627 bytes, 680.976 sec., 1.170 sec., lzturbo -39 -p1 -b800

best compression:

82,891,405 bytes, 882.513 sec., 4.597 sec., lzturbo -49 -p1 -b800

77,286,010 bytes, 6497.059 sec., 7.715 sec., glza

there are missing e.g. density, zpaq, and comments there suggest that brotli doesn't look that good for other than text data ...

Sanmayce_Kaze · on Sept 28, 2015

You miss the point, your list is meaningful only in general compression cases, the Brotli thread (where we are) is all about boosting textual decompression while having/exploiting little resources (RAM mostly) as in the cases of Web Browsers.

As one (Jyrki) of co-authors commented: ``` For more clarity on the situation, you could compare LZMA, LZHAM and brotli at the same decoding memory use. Possibly values between 1-4 MB (window size 20-22) are the most relevant for the HTTP content encoding. Unlike in a compression benchmark, there are a lot of other things going on in a browser, and the allocated memory at decoding time is a critically scarce resource. ``` Source: http://google-opensource.blogspot.bg/2015/09/introducing-bro...

Sanmayce_Kaze · on Sept 28, 2015

Let's dramatize the next 2 scenarios:

- 10Mbps or 1MB/s connection;

- 100Mbps or 10MB/s connection.

The goal is to receive in our browser those 812,392,384 bytes as quickly as possible, in first case the winner is 77,286,010 bytes compressor, in second the winner is 90,239,627 bytes compressor, yes?

In first case transfer_time + decompression_time = 77s + 7s = 84s

In second case transfer_time + decompression_time = 9s + 1s = 10s

Now, you see that even in Web Browsing scenario the best performer is not established, right?

tmd83 · on Sept 22, 2015

I'm just wondering what it's like for a data compression expert whether algorithm or coding. It must be really wonderful and energising to see such interest and momentum in ones field.

I wonder if anyone working on the field would have a comment on that.

bsimpson · on Sept 22, 2015

It's strange to see an "introducing" post for an algorithm that's already included in a deployed technology (WOFF2), especially one that doesn't point that out.

lectrick · on Sept 22, 2015

Call me a noob but I can't figure out how to build it...

https://github.com/google/brotli

Galanwe · on Sept 22, 2015

$ git clone https://github.com/google/brotli.git

$ cd brotli

python extension:

$ python setup.py build

static lib:

$ cd enc

$ make

$ ar rvs brotli.a *.o

Slylencer · on Sept 22, 2015

cd tools

make

Worked for me on Ubuntu 14.04. Not sure what packages are necessary. YMMV.

x0 · on Sept 22, 2015

mhm, `cd tools; make` worked on OS X 10.10.5.

doomrobo · on Sept 22, 2015

I wonder if the CRIME attack would still work under this compression algorithm

Strom · on Sept 22, 2015

The fundamentals of the CRIME attack work with any compression algorithm. That's why HTTP 2.0 doesn't use compression for headers and sends deltas instead.

otabdeveloper1 · on Sept 22, 2015

There's a fantastic site with benchmarks for almost all compressors ever: https://quixdb.github.io/squash-benchmark/

Strangely enough, no one compressor is better in all situations. :)

SloopJon · on Sept 22, 2015

It doesn't look like Google's paper or the results on this site consider CloudFlare's zlib fork:

https://github.com/cloudflare/zlib

Previously discussed here:

https://news.ycombinator.com/item?id=9857784

And a recent performance comparison with Intel's patches:

https://www.snellman.net/blog/archive/2014-08-04-comparison-...

dfinlay · on Sept 23, 2015

Thanks, this is tremendous. I was doing my own comparisons recently, but without a nice aggregated set of benchmarks to check against.

jjaredsimpson · on Sept 22, 2015

Such a compressor can not exist.

cslmy · on Sept 22, 2015

[flagged]

orblivion · on Sept 22, 2015

No, it's Hooli beating them to the punch.

anc84 · on Sept 22, 2015

What is that Pied Piper thing I constantly see referenced somewhere? Some SF or startup meme?

wingerlang · on Sept 22, 2015

It's from a TV series called Silicon Valley.

https://en.wikipedia.org/wiki/Silicon_Valley_(TV_series)

acqq · on Sept 22, 2015

What's the point of mentioning a fictional compression startup as a comment on the algorithm made by Google?

To tell the world that he also watched the series?

valarauca1 · on Sept 22, 2015

>What's the point of mentioning a fictional compression startup as a comment on the algorithm made by Google?

A joke, or a pop culture reference. Stating a shared experience that others may also gleam some enjoyment out of, or to create the illusion of a shared community/experience to help maintain the illusion that we are not just meaningless souls adrift in an indifferent, eternal void, who's incomprehensible size only re-enforces the meaninglessness of our existence.

But like I said. A joke.

acqq · on Sept 22, 2015

It's a kind of spam for me, as I still, even after your explanation, don't see it more related as spam to eggs and bacon is. So please user csimy "could you do the egg bacon spam and sausage without the spam then?" (1)

1) a reference to Monty Python which in this context demonstrates the relativity of the "shared experience." Even if it gave the name to the electronic spam!

valarauca1 · on Sept 22, 2015

>It's a kind of spam for me

Then flag/downvote it. Commenting about how unproductive an unproductive discussion is just further propagates the original unproductive discussion [1].

[1] Reference (read parents): https://news.ycombinator.com/edit?id=10258024

rpalmaotero · on Sept 22, 2015

You must be fun at parties.

wingerlang · on Sept 22, 2015

One one hand I understand it, pop culture references has been a thing for ages and ages. On the other hand, I don't know because many take it to an extreme (see: reddit).

stavros · on Sept 22, 2015

Yes, exactly. It's the sort of thing that tends to get downvoted here.

teddyh · on Sept 22, 2015

“References to a shared culture are the universal language of camaraderie.”

http://tailsteak.com/archive.php?num=29

rjtavares · on Sept 22, 2015

Considering that Google's biggest press release of the year included a reference to that particular fictional world[1], and that we're talking about compression technology, I would say it's appropriate.

[1] http://bgr.com/2015/08/11/alphabet-easter-egg-google-hooli/

wolfgke · on Sept 22, 2015

Being German I thought you meant the Pied Piper of Hamelin (https://en.wikipedia.org/wiki/Pied_Piper_of_Hamelin).

lectrick · on Sept 22, 2015

Any mention of "Pied Piper" in English is almost always a reference to that story.

wingerlang · on Sept 22, 2015

I was not the one who wrote 'Pied Piper'. But I've seen the show, and I believe that is the one they named themselves after.

kristofferR · on Sept 22, 2015

http://www.piedpiper.com/

PickledJesus · on Sept 22, 2015

https://en.wikipedia.org/wiki/Silicon_Valley_(TV_series)

shard · on Sept 22, 2015

It's the name of the compression start-up in the show Silicon Valley.

raverbashing · on Sept 22, 2015

Does it compress data middle-out?

return0 · on Sept 22, 2015

> The higher data density is achieved by a 2nd order context modeling, re-use of entropy codes, larger memory window of past data and joint distribution codes.

They still haven't cracked middle out...

pibefision · on Sept 22, 2015

Russ Hanneman says: "ROI guys, ROI..."

anupj · on Sept 22, 2015

[flagged]

stavros · on Sept 22, 2015

Oh god, nobody can talk about compression any more without pop culture references. Can we stop it, please? We already know people here watch the show.

lectrick · on Sept 22, 2015

But it's actually, surprisingly, arguably, a valuable metric which, for whatever reason, was never done "in the real world" before, until the show requested it from actual researchers:

http://spectrum.ieee.org/view-from-the-valley/computing/soft...

I don't actually watch the show and this is the first I've heard of this "score", that article was surprising to me.

stephencanon · on Sept 22, 2015

It's a terrible metric that fails basic dimensional analysis. Anytime you see a ratio of logs, you should already be suspicious. Say algorithm A is twice as fast as algorithm B. Then we'll have alpha r/R log(T)/log(t). The alpha r/R is constant, so the only interesting part is log(T)/log(t) = log(2t)/log(t) = log2/log(t) + 1. We can make this apparently unitless number take any value we want just by changing the unit used to measure time (or if a unit were fixed, by changing the amount of data used to test).

It's sort of vaguely superficially sensible, but the idea of being able to reduce comparison of compression algorithms (which fundamentally trade off speed for compression ratio) to a 1-dimensional value is laughable. Charles Bloom's Pareto frontier charts (http://cbloomrants.blogspot.com/2015/03/03-02-15-oodle-lz-pa...) are one of the more reasonable options.

jefftk · on Sept 22, 2015

The metric divides compression ratio by log compression time. (Ignore for now the normalization to a standard compressor.) This is:

    r / log(T)

That doesn't seem like a good metric, because it overvalues changes in T. For example, say we currently can manage 10% compression (ratio = 100/90 = 1.11) and it takes us 16ms. That's

    r / logT = 1.11 / log(16) = 0.2775

Now we have two proposals. One brings us to from 10% compression to 55% compression (ratio = 100/45 = 2.22) while the other one drops compression time to 4ms:

    r / logT = 2.22 / log(16) = 0.555
    r / logT = 1.11 / log( 4) = 0.555

But improving compression by 5x matters more than improving speed by 4x. Take this to the extreme: a compressor that exits immediately leaving its input unchanged has the best possible score here, despite being useless.

stavros · on Sept 22, 2015

Sure, but "lol what's the weissman score HASHTAG SILICONVALLEYTHESHOW \m/" doesn't lead me to think that the poster was legitimately asking for the score so they could compare.

sepharoth213 · on Sept 22, 2015

Yeah, it was a pretty painful post, but I would want to see the score. Its performance over gzip would be a more intuitive reference point than the '26% over Zopfli.'

camillomiller · on Sept 22, 2015

"At Google, we think that internet users’ time is valuable, and that they shouldn’t have to wait long for a web page to load."

Says the company whose business model consists of convincing publisher to put ads in their website, thus slowing down page load times. Same company that lets you integrate Web elements like scripts and web fonts from their servers, once again making your webpage slower to load.

I know I'm exaggerating it a bit, but I really hate this company mission pitches that people in big companies constantly use to open their technical communications.

qq66 · on Sept 22, 2015

"At Walmart, we think that users' money is valuable, and that they shouldn't have to pay extra for their toilet paper."

Does the fact that Walmart charges money for their products make the above sentence disingenuous? Of course Walmart charges money, that's how their company works. That doesn't mean they can't try to reduce waste or inefficiency in their system.

abalos · on Sept 22, 2015

Another thing too - I've found that other advertising platforms' ads are much more obnoxious. Google does a pretty decent job at keeping their adds from popping up in your face with technicolor, which I can appreciate.

_r5wf · on Sept 22, 2015

You are exaggerating and your rant is not fair or relevant. They are honest with that sentence and this is potentially a big contribution to software world and Internet.

camillomiller · on Sept 22, 2015

I wasn't diminishing the technology's importance in any way. My point was actually the opposite: why spoil a good communication about technological breakthrough with such lazy and fake-sounding company propaganda?

camillomiller · on Sept 22, 2015

Well, considering the discussion it seems to have sparked, I would say it's not that irrelevant either.

SagelyGuru · on Sept 22, 2015

"At Google, we think that internet users’ time is valuable, and that they shouldn’t have to wait long for a web page to load."

There is plenty of evidence of significant drop in users numbers with increasing page load time.

An advertising company has to ensure that their ads are not only displayed but that there is also a reasonable chance that they will be noticed. Otherwise it becomes a pointless exercise, advertisers leave and revenues drop.

Therefore what is really going on here is Google trying to squeeze in more advertising before annoying the users to the point of leaving the page.

Yes, I agree, advertising in itself is inimical to user's time and thus the above sentence can be seen as deeply hypocritical.

jonas21 · on Sept 22, 2015

I dunno. Ads on Google search don't delay your results in any noticeable way. Wouldn't it be great if all ads were like that, given that ads are currently the only viable option for a wide variety of sites to stay in business?

camillomiller · on Sept 22, 2015

The answer is a sounding yes, but I was just criticising the hypocritical nature of such a sentence, which is totally gratuitous in a post made to communicate a technological advancement. That sounds as fake as corporate bullshit could be.

lgieron · on Sept 22, 2015

Why would anyone trust this kind of corporate BS? It baffles me that you even try to take what they say at face value.

For me, it's akin to state propaganda in non-democratic countries: everyone knows that whatever authorities say and what the truth is are two different things, so there's little point to even analysing the official message (except maybe for humor factor etc.)

mda · on Sept 22, 2015

Which part of the sentence is not true?

stonogo · on Sept 22, 2015

None of it?

If they thought my time was valuable, they would not put colored strips at the top of every screen that say "Switch to chrome" or "Switch to GMail" or even after I've switched to Chrome, "Chrome is not your default browser." If they thought my time was valuable, they wouldn't be wasting it with their constant desperate pleas to browse the internet in precisely the way they want. If they really valued my time, they wouldn't throw random context switches into every single one of their web properties for whatever is the product du jour.

If they wanted my page to load faster, they would work as hard on making their stupid website addons load fast as they worked on getting everyone on earth to install them. Running local mirrors of ajax.googleapis.com and fonts.googleapis.com, along with hijacking analytics and doubleclick and returning 0-byte files, is the best thing I ever did for my poor parents' satellite internetion connection.

Internet users' time (and site loading speeds) are very much second- or third-class items on google's list of things to give a shit about. I'm absolutely not questioning thees decisions; google is an advertising agency and must prioritize this over my convenience! But pretending like they're some kind of altruistic charity working for the common good is disingenuous and slightly offensive.

It also creates the problem where some of the more gullible people actually believe google, a huge faceless global company, gives a shit about anyone in particular, which is demonstrably untrue.

magicalist · on Sept 22, 2015

eh, between Chrome DevTools, PageSpeed Insights, SERP hits based on load time, SPDY/HTTP2, WebM, new compression algorithms, etc etc I would say Google has done more than a little to help reduce wait times for web pages.

ocdtrekkie · on Sept 22, 2015

Some Google pages have load times now upwards on ten seconds on Firefox, this is to me, what makes this statement so hilarious.

thrownaway2424 · on Sept 22, 2015

I'm sure you are just about to produce a list of these pages.

ocdtrekkie · on Sept 22, 2015

Google+ on Firefox is a great example. If you use Firefox, it'll actually cause Firefox to seize and lock up for several seconds while it loads.

stonogo · on Sept 22, 2015

GMail is the first website I ever saw that had a 'loading' progress bar.

thrownaway2424 · on Sept 22, 2015

Gmail has a loading bar because it has a massive storage backend, not because it's sending markup to your browser. Comparable products like HN favorite FastMail also have loading progress bars.

Synaesthesia · on Sept 22, 2015

Yes actually the goal of all companies is to make profit, as much as possible. I know, it annoys me too, for example Apple says their mission is to "leave the world better than how we found it", but I'm skeptical about that ;-)

mda · on Sept 22, 2015

Your comment reminded me of an old Turkish saying: "Looks like your intention is not to eat the grape but to beat the vineyard keeper."

camillomiller · on Sept 22, 2015

Sorry but I don't get it :D

mda · on Sept 22, 2015

It can be used in several situations, one could be:

The acts of a person implies that person's motivation is not to use a product or comment about qualities a product but to bash the producer/creator using the context of the product. So ulterior motive is to punish creator, the actual product is only a medium. (grape: product - wineyard keeper: producer)

Sorry it sounded a bit harsh. I also agree that a dry and more technical blog entry would be better, but I think it is not a big deal considering the importance of the product.

rplnt · on Sept 22, 2015

Obviously the reality is to serve ads more efficiently. But is it bad if it helps users in the end as well?

Bjorkbat · on Sept 22, 2015

Can't help but wonder if they stole this from some anxious college dropout who left their company to form their own compression startup.

;)

kornakiewicz · on Sept 22, 2015

"We hope that this format will be supported by major browsers in the near future, as the smaller compressed size would give additional benefits to mobile users, such as lower data transfer fees and reduced battery use."

Is this the same story everytime that G. invents something, implements it in Chrome and then gains advantage, because takes it as standard and other browser hadn't implement it yet?

lorenzhs · on Sept 22, 2015

No, this was first proposed for inclusion in Firefox two years ago: https://groups.google.com/forum/#!topic/mozilla.dev.platform...

This wasn't developed and deployed in secret. According to the comments in https://bugzilla.mozilla.org/show_bug.cgi?id=366559, the GitHub repository has been public since at least November 2014.

You may not like Google, but insinuating evil plans every time they do something cool isn't helping anyone.

dchest · on Sept 22, 2015

Also, it's used in W3C Draft Standard for WOFF2 fonts:

http://www.w3.org/TR/WOFF2/

mhandley · on Sept 22, 2015

There doesn't seem to be any technical downside to implementing it and likely significant upside. It's Apache-licensed which grants both copyright and patent rights that cover the code. The spec is published as a Internet Draft, scheduled to become an Informational RFC. What's not to like? I'm not one to drink the Google kool-aid, but seriously, what else would you like Google to do?

ahoge · on Sept 23, 2015

Firefox's Brotli support is already done. Chrome's isn't.

By the way, new features are generally created this way. They are added to browsers way before they are standardized. You see, convincing the other browser vendors isn't easy. You need very compelling arguments.

Take WebP, for example. It provides massive benefits. Especially if you can use lossy RGBA instead of PNG32 (easily 80% smaller). And yet, Mozilla shows little interest in implementing it.