Making Photos Smaller Without Quality Loss

ggambetta · on June 5, 2017

"MAKING PHOTOS SMALLER WITHOUT QUALITY LOSS"

proceeds to explain how they set JPEG quality to 80-85%

kccqzy · on June 5, 2017

Exactly my thought too. A while ago Dropbox had an article about how they optimized images while making sure the output is bit-for-bit identical.

Link: https://blogs.dropbox.com/tech/2016/07/lepton-image-compress...

jszymborski · on June 5, 2017

I don't know it's fair to assume quality means theoretical quality over perceptual quality in this context. They took the time to make sure perceptual quality was quantitatively consistent, so they get a pass in my books.

I'd be angry, however, if they used the word "lossless", as that implies no _information_ is lost.

gknoy · on June 5, 2017

You're also overlooking the fact that they did some relatively interesting things to notice when someone posts a PNG-of-something-that-could-be-a-JPEG, and can then use a lossy image instead of the larger PNG.

WhitneyLand · on June 5, 2017

How is this a big mystery isn't pretty much everything that's user generated content a photo and everything that's a logo or brand not a photo?

abritinthebay · on June 5, 2017

Read the article. Short answer: no, while your image content breakdowns are accurate it's not actually that simple in practice as things like screenshots are pngs.

khedoros1 · on June 5, 2017

They've also got to handle photos with borders, screenshots, and such, which may or may not work better as PNGs. And anyhow, why wouldn't they want to handle un-photo-like photos and un-logo-like logos optimally too?

ggambetta · on June 5, 2017

I agree that that's super interesting, but it's not "without quality loss".

lucideer · on June 5, 2017

You misread the article. Setting the JPEG quality is their starting point. He then goes on to describe reducing size from that starting point without further quality loss.

Granted, initial reading of the title might lead you to assume he means "from upload quality", but I don't think it's intentionally misleading.

vladdanilov · on June 5, 2017

SSIM being a global metric is not sufficient for many cases. Images with soft gradient tend to be overcompressed.

MozJPEG is a good improvement. But its trellis cost model causes noticeable blurriness on fine details [1]. Compare with the original [2], Guetzli [3] or my Optimage [4].

So, 'Without (Visual) Quality Loss' is such a stretch.

[1] http://i.imgur.com/6naOWSf.jpg [2] http://i.imgur.com/jny5miJ.jpg [3] http://i.imgur.com/gkO2HsP.jpg [4] http://i.imgur.com/GEshKJD.jpg

keymone · on June 5, 2017

i see literally no difference between all of these. i've stared and flipped through them for a minute.

khedoros1 · on June 5, 2017

The biggest difference that I see is that the leaf veins are clearer in the original image than in the other three, and that the MozJPEG one did lose some of the finest details.

woliveirajr · on June 6, 2017

Seconded. I can only see differences if I change my monitor settings and zoom the image. So, for the original size as shown without changing configurations from the monitor, there is no clear looser/winner.

mrob · on June 6, 2017

All of them cause noticeable quality loss to the orange anthers of the flower in the top left.

vladdanilov · on June 6, 2017

Conditions applied: 100% zoom and average viewing distance.

The real challenge is consistency. Mozjpeg quality is manually tuned here to match the filesize of the rest.

js2 · on June 5, 2017

Similarly at Flickr a couple years ago:

http://code.flickr.net/2015/09/25/perceptual-image-compressi...

As well as other changes to reduce storage costs:

https://code.flickr.net/2017/01/05/a-year-without-a-byte/

mbesto · on June 5, 2017

> This adds up to an average image file size reduction of around 30%, which we applied to our largest and most common image resolutions, making the website faster for users and saving terabytes a day in data transfer.

I love these type of endeavors because the ROI is pretty clear. So what's the costs ($$) savings?

tehlike · on June 6, 2017

Data transfer is cheap. If they used webp savings would be even significant.

Data transfer for a mobile user is not that cheap, thats more important.

mark-r · on June 5, 2017

When I looked at the code for PIL a few years ago, the part I was looking at (resizing) was a real mess with some significant bugs. Has Pillow gotten any better?

wiredfool · on June 6, 2017

Pillow has had some significant changes in the resampling filters in the last year or two, both in correctness and speed.

(PIL is dead. last release was 10 years back)

markdown · on June 5, 2017

Were any of your patches accepted?

mark-r · on June 5, 2017

I never bothered, because 1. PIL appeared to be stagnant, 2. without being part of the community I could easily make breaking changes since I judged the existing behavior to be incorrect, and 3. I had other things to do. I have seriously considered going back to see if there's anything I could contribute - I haven't looked at it in years, and it would take a while to identify the code I had looked at before.

dabber · on June 6, 2017

Hmm, I'm behind a popular VPN and get greeted with:

"Sorry, you’re not allowed to access this page."

"Contact Yelp if you keep experiencing issues."

Oh well

WhiteOwlLion · on June 5, 2017

Why not use Google's Guetzli to compress JPEGs?

thebostik · on June 6, 2017

(author here), It wasn't available when I started this project. butteraugli was (which guetzli uses internally), but in my opinion they're relatively new and need a bit more rigorous review than I can provide. I ultimately chose SSIM not because it was the most accurate metric in all cases, there have been plenty of advancements since it was published. But I linked a paper comparing several alternatives and saw that SSIM works fine for this use case (all things equal except number of JPEG artifacts), while being much faster, and that was enough proof for me to keep it simple.

In a batch scenario today, one of these could make sense instead of SSIM and/or mozjpeg, but definitely do your own comparisons at equivalent file sizes, required reading: https://kornel.ski/en/faircomparison

We do set a pretty high lower bound to avoid the edge cases inherent in the algorithm and as a result, out of the 30% savings, dynamic quality was only a small portion of that. I wouldn't be surprised that with an offline workload, a more CPU-intensive algorithm can do much better with a lower lower bound.

I found that SSIM scores for images were relatively stable across resolution (same general graph shape, just translated down on the Y-axis). This is mentioned at line 36 of the example dynamic quality code. So we actually generate, compute and compare the SSIMs on much lower resolution candidate images than the final image for speed as well. I'm not sure if this would hold true for something like butteraugli to help open up the possibility for more real-time workloads.

floatboth · on June 6, 2017

Because it's HORRIBLY slow. Mozjpeg is actually usable for on-the-fly processing, you can run it right in a request handler. Guetzli is waaaay too slow.

dawidloubser · on June 6, 2017

Agreed. As a photographer, I use and love Guetzli, and it offers a bona fide size reduction of more than 30% over typical JPEG compression algorithms at a similar visual quality level.

But it's extremely computationally-intensive, taking over a minute to compress a single web-resolution image on my i7 laptop.

I can't see it being practical in a high-volume server scenario.

discreditable · on June 5, 2017

I'm surprised they didn't try losslessly optimizing PNGs with optipng and advancecomp. Optipng in particular can result in some pretty significant size reductions in my experience.

masklinn · on June 5, 2017

They're working with photos, PNG is not really useful for photos, the DEFLATE algorithm doesn't work with "noisy" data.

discreditable · on June 5, 2017

They specifically mentioned that they have PNGs of logos that they avoid converting if they're <300kb. Optipng would work well on the images that they keep in PNG format.

floatboth · on June 6, 2017

They should try just compressing them with mozjpeg anyway, since mozjpeg has deringing https://calendar.perfplanet.com/2014/mozjpeg-3-0/ it's actually not bad for text/cartoons/other things with high-contrast edges

pyre · on June 5, 2017

Presumably the noisiest PNGs were they ones that they converted to JPG in the section "Large PNG Detection."

kemonocode · on June 6, 2017

It took me a shameful amount of time to realize the peppered 400: Invalid request's through the article were unintended.

acranox · on June 6, 2017

I think that photo of a "donut" is actually a bagel.

ge96 · on June 5, 2017

What that's what progressive means... where do you select that, is that something you select in say GIMP?

>Progressive JPEG images load from more blurry to less blurry. The progressive option can easily be enabled in Pillow

What is Pillow, Python... hmm... so loading a small version spread out to full size with blur, then in the background loading full size copy to replace blurred small image, that is not progressive loading...?

If I was looking into progressive loading... and implemented a CDN... can this still work? Is it python specific? I use PHP for scripting... maybe an excuse to actually build something with Go rather than the hello world examples. (assuming you can use Go)

Edit: also is progressive loading something that happens once or does a script have to do that every time the photo is pulled?

What is this a support forum? haha calm down kid

masklinn · on June 5, 2017

> What is Pillow

A Python library for manipulating image, like ImageMagick and the like.

> so loading a small version spread out to full size with blur, then in the background loading full size copy to replace blurred small image, that is not progressive loading…?

Progressive JPEG is closer to refinement of previous blocks, it doesn't "add blur", it starts with smaller less precise blocks then refines them. Progressive jpeg actually tends to be smaller than non-progressive (as opposed to e.g. progressive PNG)

> Edit: also is progressive loading something that happens once or does a script have to do that every time the photo is pulled?

There is no script, progressive JPEG is part of the core spec.

ge96 · on June 10, 2017

Sorry I wasn't implying blur was a part of progressive loading. I kept hearing about progressive loading but didn't understand what it was.

I used the blur-up method, low res small file blurred, replace with high res full version without blur.

Thanks for the info. Is it right to assume any photo with a .jpg extension does this by default? I'll go hit the links.

WhiteOwlLion · on June 5, 2017

If we had progressive JPEG for all JPEGs, low bandwidth networks could request a percentage of the file and save on total data needed to render a page. This could be useful for areas where 2G may be the norm.

Negitivefrags · on June 6, 2017

In this age of devices with different DPI and pinch-to-zoom it would be nice if the browser would only get as many progressive levels as required for the current display resolution of the image.

jharger · on June 5, 2017

I remember some dial-up ISPs doing almost this a few years ago. A friend of mine lived in a rural area, and his dialup provider would show a low-res version of the image. If he clicked on the image, it would load the full version. I think that they would hijack the webpage and do their special image cache thing on a proxy server. It was a really weird experience, but also kind of cool.

mysterydip · on June 5, 2017

Back in the days of slow internet access, nearly every non-animated image was a jpg and loading from blurry to clear was the norm. It's part of the file format.

mark-r · on June 5, 2017

The idea was that you'd see the image get progressively better as each pass was completed, thus the term "progressive". But in practice the browsers decided not to display anything until the image was finished loading anyway.

sp332 · on June 5, 2017

I don't think this is right. IE9 renders progressive JPEGs progressively. https://calendar.perfplanet.com/2012/progressive-jpegs-a-new...

mark-r · on June 5, 2017

It appears the situation isn't static, thanks for the link! It would have been interesting to see older versions of Chrome and Firefox tested too - I'm wondering now if my recollections are faulty.

cpeterso · on June 6, 2017

This site simulates a slow download of a progressive JPEG so you can test how your browser renders them. The page is from 2007 so its claim that IE doesn't render them progressively appears to be out of date.

http://pooyak.com/p/progjpeg/

khedoros1 · on June 5, 2017

I clearly remember watching the image clarity crawling down photos on dial-up. Otherwise, you'd be staring at a blank screen for 2 minutes, instead of a gradually-improving image.

(Since you mentioned Chrome and Firefox in another comment, I'm not thinking of browsers that new).

jschwartzi · on June 6, 2017

Netscape Navigator certainly did this. IE did too, I recall.

pyre · on June 5, 2017

Pillow is a fork of PIL -- the Python Imaging Library -- which had become unmaintained ('PIL' is in 'pillow'). It's the go-to image manipulation library on Python.

sp332 · on June 5, 2017

Yes, it's one way of storing the image data in a single JPEG file. It doesn't need multiple files and it doesn't need any extra code. It's usually only visible when you're loading the file over a slow link. Old software might not show the intermediate steps from the file but they should still be able to display the image once it's 100% loaded.

ge96 · on June 10, 2017

This isn't the same as when you see a photo load with horizontal lines vertically usually top-to-bottom right?