I don't know it's fair to assume quality means theoretical quality over perceptual quality in this context. They took the time to make sure perceptual quality was quantitatively consistent, so they get a pass in my books.
I'd be angry, however, if they used the word "lossless", as that implies no _information_ is lost.
You're also overlooking the fact that they did some relatively interesting things to notice when someone posts a PNG-of-something-that-could-be-a-JPEG, and can then use a lossy image instead of the larger PNG.
Read the article. Short answer: no, while your image content breakdowns are accurate it's not actually that simple in practice as things like screenshots are pngs.
They've also got to handle photos with borders, screenshots, and such, which may or may not work better as PNGs. And anyhow, why wouldn't they want to handle un-photo-like photos and un-logo-like logos optimally too?
You misread the article. Setting the JPEG quality is their starting point. He then goes on to describe reducing size from that starting point without further quality loss.
Granted, initial reading of the title might lead you to assume he means "from upload quality", but I don't think it's intentionally misleading.
SSIM being a global metric is not sufficient for many cases. Images with soft gradient tend to be overcompressed.
MozJPEG is a good improvement. But its trellis cost model causes noticeable blurriness on fine details [1]. Compare with the original [2], Guetzli [3] or my Optimage [4].
So, 'Without (Visual) Quality Loss' is such a stretch.
The biggest difference that I see is that the leaf veins are clearer in the original image than in the other three, and that the MozJPEG one did lose some of the finest details.
Seconded. I can only see differences if I change my monitor settings and zoom the image. So, for the original size as shown without changing configurations from the monitor, there is no clear looser/winner.
> This adds up to an average image file size reduction of around 30%, which we applied to our largest and most common image resolutions, making the website faster for users and saving terabytes a day in data transfer.
I love these type of endeavors because the ROI is pretty clear. So what's the costs ($$) savings?
When I looked at the code for PIL a few years ago, the part I was looking at (resizing) was a real mess with some significant bugs. Has Pillow gotten any better?
I never bothered, because 1. PIL appeared to be stagnant, 2. without being part of the community I could easily make breaking changes since I judged the existing behavior to be incorrect, and 3. I had other things to do. I have seriously considered going back to see if there's anything I could contribute - I haven't looked at it in years, and it would take a while to identify the code I had looked at before.
(author here), It wasn't available when I started this project. butteraugli was (which guetzli uses internally), but in my opinion they're relatively new and need a bit more rigorous review than I can provide. I ultimately chose SSIM not because it was the most accurate metric in all cases, there have been plenty of advancements since it was published. But I linked a paper comparing several alternatives and saw that SSIM works fine for this use case (all things equal except number of JPEG artifacts), while being much faster, and that was enough proof for me to keep it simple.
In a batch scenario today, one of these could make sense instead of SSIM and/or mozjpeg, but definitely do your own comparisons at equivalent file sizes, required reading: https://kornel.ski/en/faircomparison
We do set a pretty high lower bound to avoid the edge cases inherent in the algorithm and as a result, out of the 30% savings, dynamic quality was only a small portion of that. I wouldn't be surprised that with an offline workload, a more CPU-intensive algorithm can do much better with a lower lower bound.
I found that SSIM scores for images were relatively stable across resolution (same general graph shape, just translated down on the Y-axis). This is mentioned at line 36 of the example dynamic quality code. So we actually generate, compute and compare the SSIMs on much lower resolution candidate images than the final image for speed as well. I'm not sure if this would hold true for something like butteraugli to help open up the possibility for more real-time workloads.
Because it's HORRIBLY slow. Mozjpeg is actually usable for on-the-fly processing, you can run it right in a request handler. Guetzli is waaaay too slow.
Agreed. As a photographer, I use and love Guetzli, and it offers a bona fide size reduction of more than 30% over typical JPEG compression algorithms at a similar visual quality level.
But it's extremely computationally-intensive, taking over a minute to compress a single web-resolution image on my i7 laptop.
I can't see it being practical in a high-volume server scenario.
I'm surprised they didn't try losslessly optimizing PNGs with optipng and advancecomp. Optipng in particular can result in some pretty significant size reductions in my experience.
They specifically mentioned that they have PNGs of logos that they avoid converting if they're <300kb. Optipng would work well on the images that they keep in PNG format.
They should try just compressing them with mozjpeg anyway, since mozjpeg has deringing https://calendar.perfplanet.com/2014/mozjpeg-3-0/ it's actually not bad for text/cartoons/other things with high-contrast edges
What that's what progressive means... where do you select that, is that something you select in say GIMP?
>Progressive JPEG images load from more blurry to less blurry. The progressive option can easily be enabled in Pillow
What is Pillow, Python... hmm... so loading a small version spread out to full size with blur, then in the background loading full size copy to replace blurred small image, that is not progressive loading...?
If I was looking into progressive loading... and implemented a CDN... can this still work? Is it python specific? I use PHP for scripting... maybe an excuse to actually build something with Go rather than the hello world examples. (assuming you can use Go)
Edit: also is progressive loading something that happens once or does a script have to do that every time the photo is pulled?
A Python library for manipulating image, like ImageMagick and the like.
> so loading a small version spread out to full size with blur, then in the background loading full size copy to replace blurred small image, that is not progressive loading…?
Progressive JPEG is closer to refinement of previous blocks, it doesn't "add blur", it starts with smaller less precise blocks then refines them. Progressive jpeg actually tends to be smaller than non-progressive (as opposed to e.g. progressive PNG)
> Edit: also is progressive loading something that happens once or does a script have to do that every time the photo is pulled?
There is no script, progressive JPEG is part of the core spec.
If we had progressive JPEG for all JPEGs, low bandwidth networks could request a percentage of the file and save on total data needed to render a page. This could be useful for areas where 2G may be the norm.
In this age of devices with different DPI and pinch-to-zoom it would be nice if the browser would only get as many progressive levels as required for the current display resolution of the image.
I remember some dial-up ISPs doing almost this a few years ago. A friend of mine lived in a rural area, and his dialup provider would show a low-res version of the image. If he clicked on the image, it would load the full version. I think that they would hijack the webpage and do their special image cache thing on a proxy server. It was a really weird experience, but also kind of cool.
Back in the days of slow internet access, nearly every non-animated image was a jpg and loading from blurry to clear was the norm. It's part of the file format.
The idea was that you'd see the image get progressively better as each pass was completed, thus the term "progressive". But in practice the browsers decided not to display anything until the image was finished loading anyway.
It appears the situation isn't static, thanks for the link! It would have been interesting to see older versions of Chrome and Firefox tested too - I'm wondering now if my recollections are faulty.
This site simulates a slow download of a progressive JPEG so you can test how your browser renders them. The page is from 2007 so its claim that IE doesn't render them progressively appears to be out of date.
I clearly remember watching the image clarity crawling down photos on dial-up. Otherwise, you'd be staring at a blank screen for 2 minutes, instead of a gradually-improving image.
(Since you mentioned Chrome and Firefox in another comment, I'm not thinking of browsers that new).
Pillow is a fork of PIL -- the Python Imaging Library -- which had become unmaintained ('PIL' is in 'pillow'). It's the go-to image manipulation library on Python.
Yes, it's one way of storing the image data in a single JPEG file. It doesn't need multiple files and it doesn't need any extra code. It's usually only visible when you're loading the file over a slow link. Old software might not show the intermediate steps from the file but they should still be able to display the image once it's 100% loaded.
proceeds to explain how they set JPEG quality to 80-85%