Wait, so Google took a bunch of JPGs, re-compressed them with a lossy format, an...

wccrawford · on Sept 30, 2010

That's that the article says, but they're probably restating something and got it horribly wrong.

What probably happened was that they took images and compressed them to WebP and JPEG and compared the sizes.

dgreensp · on Oct 1, 2010

No, I would believe Google recompressed a whole ton of JPEGs as a test. Not the most scientific test of a codec, but as a Web company they are interested in improving image serving more than image authoring.

As for the recompression, you can losslessly squeeze a JPEG by at least 15% (e.g. StuffIt came out with this a few years back, claiming higher numbers but mainly at low bitrates I think). In the lossy camp, H264's intra-frame encoding significantly outdoes both JPEG and JPEG2000.

WebP is probably similar to H264 intra. As for recompression vs. straight compression, this probably has little effect to the extent JPEG and WebP are "compatible", e.g. both being block-based transforms. It would be unfair, on the other hand, to run a very different codec after JPEG and compare it to WebP after JPEG, because the other codec might be working hard to encode the JPEG artifacts.

jacobolus · on Oct 1, 2010

> It would be unfair, on the other hand, to run a very different codec after JPEG

Exactly. I found Google’s “study” pretty sketchy, given the lack of concrete detail about this. http://code.google.com/speed/webp/docs/c_study.html

Here’s some of what I wrote in an email to a friend:

Since they’re dealing with an arbitrary collection of already encoded images, there are likely e.g. artifacts along JPEG block boundaries that take up extra space in JPEG 2000. While they have a big sample size, they don’t compare the metric used (PSNR) with noticeable quality degradation.

There's a graph of size distribution (which would be a lot more readable if they binned some of the sizes and showed a histogram instead of a sea of overlapping red plus signs), but then compression percentages aren't in any way related to those various sizes: were big images easier to compress better than JPG/JP2? Small images? Looking at the size distribution, a large percentage of these images are absolutely tiny, the kinds of images that as far as I know JPEG 2000 was never intended to be used for. The overhead of the data storage container ends up dominating the size of the image for very tiny images – I don’t know anything about the relative overhead of JPG/JP2/etc. images, but it would be good to include that in any discussion.

It seems to me like the WebP images have their color profiles stripped. Is that an inherent part of WebP? If so, I hope Google doesn’t encourage people dealing with photographs to adopt it in large numbers. Browsers are just finally getting to the point where proper color management of images is expected; no need to regress there.

jamesjyu · on Sept 30, 2010

WebP is lossy. Google provides a gallery of images here: http://code.google.com/speed/webp/gallery.html

Basically, they are using predictive coding to achieve good lossy compression. However, I agree, I would also like to see double blind studies on the quality degradation.

jjcm · on Sept 30, 2010

That's great that they have the sample gallery, but without a lossless source (e.g. a control group) to compare it to the gallery that they show counts for nothing. I grabbed the source for it, I'll see if I can do a jpg/webp/png side by side with some analytical data as well.

jjcm · on Oct 1, 2010

Update: Here's some very basic file comparisons. Special thanks to Erik Anderson of http://divineerror.deviantart.com for the lossless images.

In the folder is the original image, the compressed version of the image in both jpg and webp, and a enhanced difference map between the compressed version and the lossless image.

http://jjcm.org:8081/webp

Basic analysis shows that right now webp has better preservation of luminance, but at the expense of hue/color. I'll have a blog post up in a bit with a myriad of file tests, difference maps, percentage differences, and hue offsets in a bit.

Groxx · on Oct 1, 2010

Awesome set of pictures. I can live with the squares of WebP better than the hugely-visible gradients of JPEG, methinks. Subjectively, I'd say those images show it to be definitely superior, by a pretty large margin.

Very impressive. I can easily live with the 2x decode / 8x encode with those results.

edit: though, if there's no alpha capabilities, count me out. Yes, that's my deciding factor.

asmosoinio · on Oct 1, 2010

"We plan to add support for a transparency layer, also known as alpha channel in a future update."

http://blog.chromium.org/2010/09/webp-new-image-format-for-w...

jjcm · on Oct 1, 2010

Don't jump on it yet, alpha capabilities aren't the only thing that it's missing. After some analysis it doesn't support color profiles - greyscale being the biggest factor here. About to upload a sample black and white image, and you'll see the issues there.

Groxx · on Oct 1, 2010

I was wondering if profiles might have been the reason for some of the larger hue/sat differences (over the whole image).

Will gladly keep looking, it's interesting either way :) Thanks!

jjcm · on Oct 1, 2010

Update 2: Here's the blog post along with mean delta values for both RGB and Luminance from the source image: http://news.ycombinator.com/item?id=1746621

astrange · on Oct 1, 2010

You shouldn't get any kind of percentage differences in the color channels, at any size above 8x8 blocks in the image. If the entire image is offset that most likely shows a bug in your conversion process.

wmf · on Sept 30, 2010

They are claiming better compression at constant PSNR: http://code.google.com/speed/webp/docs/c_study.html

thesz · on Sept 30, 2010

But they do not provide SSIM comparisons, they compare PSNR.

Here is "how to cheat with codec comparisons", http://x264dev.multimedia.cx/?p=472 which exaplains (partially) why PSNR isn't that great when judging image quality.

SSIM: http://en.wikipedia.org/wiki/Structural_similarity

ebiester · on Sept 30, 2010

I agree that we should expect side by side examples to support the claims, but my first thought is that they're making an analysis of the JPEG compression and using improvements on computing power to encode the information more efficiently. Consider that JPEG is using a (or more) Huffman table and a (or more) quantization table. (I'm getting this from wikipedia, IANA compression expert)

What if you analyzed all those images and came up with a composite huffman compression that was more efficient than the best guess in the 70s? Then, you did some magic on the quantization table to make the most common vales correspond to the lowest numbers, relying on processing power to decode the compressed quantization table before you started?

astrange · on Sept 30, 2010

Why bother wasting time analyzing Huffman tables when you own a video codec? JPEG is mostly an MPEG-1 keyframe. WebP is exactly a VP8 keyframe. VP8 is better than MPEG1, so there's no need to change anything when you can just use that decoder.

Although there are many inefficiencies left in VP8, and to a lesser extent H.264, when dealing with very large images. One is that the same texture can be repeated in different areas of the same image, but prediction only happens from neighboring pixels, so it can't reuse the same texture in compression. Some solutions are in the JVTVC/H.265 proposals and are usually called "extended texture prediction".

jodrellblank · on Sept 30, 2010

I note that WinZip 14 does decode JPGs and reencode them smaller, but without further loss of quality:

"The main trick those three programs use is (partially) decode the image back to the DCT coefficients and recompress them with a much better algorithm then default Huffman coding." - http://www.maximumcompression.com/data/jpg.php

Lerc · on Sept 30, 2010

Most lossy image compression systems use a transform that does not reduce the data size but makes transformed data that is more compressible. This means the final layer of compression is a traditional compressor. These, like the transforms themselves get improved upon over time. JPEG is old.

When I did some experiments with various compression techniques, I found that DCT with a LZMA base compared quite well to newer compression systems.

aufreak3 · on Oct 1, 2010

If you take jpegs from old cameras and recompress them into jpeg, you can get a remarkable reduction in storage. ... But I can't tell whether they even used such a "recompression to jpeg" control case in coming up with the 39% figure.