Hacker News new | past | comments | ask | show | jobs | submit login
A 106 kilobyte color profile for a 3 kilobyte image file
107 points by sbierwagen on Nov 5, 2011 | hide | past | favorite | 32 comments



From the article:

"This is not exactly an unknown problem. Google constantly harps on optimizing images, but Tumblr blindly reuses images that its users hand it. This makes me sad."

There are tons of valid uses for the color profile and other metadata in PNG files. Generally, I don't want services dicking with this stuff (in some cases I don't care, but I don't want the service to decide when I do or don't because it is bound to decide wrong some of the time, so they should just leave my data alone).

Onus should be on the content developer to save the image in a proper format and strip all metadata that isn't required. I don't want services I use stripping this stuff willy-nilly. Perhaps I want accurate color profile data in my PNGs so my tumblr followers can make accurate prints. Perhaps I want geolocation EXIF data in there because my blog is read by people who want to see this data, etc.

tl;dr -- Don't listen to this guy, Tumblr (or anyone else). He makes the classic mistake of "because something isn't important to me, it has no use".


There is no possible reason why you would need metadata for any kind of thumbnail, much less a user icon.

In any event, tumblr already resizes all large images to a maximum of 1280 pixels on either axis. If you were very concerned with maintaining images in their original state, this would be driving you up a wall. Good thing it's trivially easy to use alternative image hosting sites, like flickr or photobucket, on tumblr.


There's metadata and there's metadata. Sure, EXIF, IPTC and XMP data is usually entirely useless for web images. But colour profile information actually has the information needed to correctly render the colours in the image.

You could argue that small thumbnails don't matter. I would agree for small sizes, but as the size increases, it becomes important to preserve the ICC profile data. With bandwidth having become cheap, and processing power having become abundant, these days it's often appropriate to downscale images in the browser; so instead of storing thumbnail as 40x40, 60x60, 100x100 etc. depending on the use cases, just store a flexible version (say, 150x150) and let the browser scale. At that resolution, I would keep the ICC data.


Well, except that this particular color-profile was for the sRGB color space, which is exactly what every browser and image-viewer on the planet defaults to if no color-profile is supplied. It'd be like shipping an ASCII text file with a Latin1-to-Unicode mapping table.


Indeed, if the colour space is sRGB you can remove it, of course.


Does any web browsers actually use that information?


Yep. Safari (since 2.0), Chrome (since 15, I believe), Firefox (supported since 3.0, but only enabled in 3.5 and later) and IE (since 9.0) all use embedded ICC metadata. No idea about other browsers.


Well, in the case of tumblr, most users are not technically proficient enough to know what's going on there. Plus, they certainly don't care about tumblr's costs, unless their pictures load noticeably slowly. And even then... well, let's just say most tumblrs are not exactly carefully-maintained gardens. So tumblr is likely serving tons of wasted data, slower than necessary, harming most of their user's experiences by preserving that data. Consider mobile users with limited data plans - they likely don't feel much sympathy with your argument.

Compromise: tumblr receives the image, compresses it, and shows the result to the user saying "how does it look?", as well as a disclosure triangle to show the list of optimizations it made, noting that if the picture looks wrong, to click it. When that disclosure triangle opens, it also offers a button saying something to the effect of "use what I uploaded, damn it!"

The vast majority who don't care click right through, those who do have a simple interface to see exactly what's going on with their image, and X thousands of pages can load quicker and save millions of people's bandwidth (and tumblr a bunch of money). Mobile users don't waste money on useless data and their images load much faster.


I learnt a lot about PNG from that, thank you.

What is a sane solution for users without clue? Does the website just need a good PNG optimizer and hope nothing breaks too bad, and if it does it's only an image?


You can just run Pngcrush[1] in a directory full of images, and it'll optimize 'em all. Trivially easy to automate.

1: https://secure.wikimedia.org/wikipedia/en/wiki/Pngcrush


pngout does an even better job. He got it to 3.54 kilobytes, but pngout got it to 2.9K.

For ultimate optimization do pngout, then optipng, then advpng - in that order! Each one sometimes reduces it even more than the one before, but only if you run them in that order.


As a side note, you don't need to use secure.wikimedia.org any more. Changing the protocol on any of the Wikimedia sites to HTTPS works fine and doesn't have any mixed content errors.


HTTPS Everywhere is still redirecting to secure.wikimedia.org, so that's probably where any remaining links there are coming from.


Do browsers use the color profile for displaying the image? If so, wouldn't this affect the appearance of the image?


Some do, some don't. In most cases it will end up being converted to sRGB, so you're better off doing that conversion yourself.


There is a test to check if your browsers support it. http://www.color.org/version4html.xalter Chrome has no support at all, which is annoying really.


Chrome seems to support that, based on latest Chrome on mac.


Chrome 15 on OSX Lion here. Doesn't work at all.


Latest chrome on windows 7 here doesn't.


Safari, Firefox, Opera and even IE uses and adheres to ICC profiles in images. No idea what Chrome or other browsers do.


The main reason I'd see for originally including PNG color space metadata is to correct for the inconsistent gamma between OS X 10.5 and earlier (gamma 1.8) and Windows (gamma 2.2). Snow Leopard and Lion have switched to 2.2 so the problem is graduating, but web designers before 2009 had to worry that the colors on their Macs did not match the colors on PCs.

The real question is why throw in the whole 100K including the color space and all, when only a gamma value would suffice?


What I've found is that the more a system tries to do colors "correctly" the more likely they will get fubared. I remember the bad old days in the 90's when Adobe Photoshop was the worst tool for editing web images because it would screw up the gamma correction every time.

'Stupid' programs like the gimp would deliver images that looked ok because they followed the hippocratic oath: first do no harm.


It's pretty easy to fix using ImageMagick which is what most apps are using to resize images. At the command line it's the -strip option and in rmagick it's the strip! method. This does remove all meta data but for thumbnails like this it's probably desirable.


ImageMagick also has a -thumbail option that you can use in place of -resize:

This is similar to -resize, except it is optimized for speed and any image profile, other than a color profile, is removed to reduce the thumbnail size. To strip the color profiles as well, add -strip just before of after this option.

http://www.imagemagick.org/script/command-line-options.php#t...


Having colour tables in image files is a bad idea for these reasons:

1) For small images, the table is bigger than the raw image itself, as this article amply demonstrated.

2) For poor quality images, colour quantisation does not save anything.

3) Good quality images with subtle colours are degraded by it.


I don't think you fully understand what these "color tables" are. They are ICC profiles, and they have a purpose. They do not degrade colors. They enable your browser (if it is a modern browser) to display color accurately.

In the example in the article, the embedded profile does nothing at all, because the default for the web is already sRGB. If your image, on the other hand, is in some other color space, such as Adobe98, the embedded profile allow the image to be displayed correctly in modern broswers. Without that profile, the colors will be wrong. Of course, in older browsers, the colors will be wrong regardless of the embedded profile, because most older browsers do not support color management. That is why, for display on the web, images should already be converted to sRGB. Most photo sites do this automatically.


Are you thinking of palettes here?


It's not just colour profile metadata. JPEG images are often loaded with EXIF, IPTC and XMP metadata that is mostly useless for serving as part of web pages. Our users kept uploading images where the metadata were in the order of megabytes, exceeding the sizes of the JPEGs themselves, so we started stripping it. Not sure what was in them, possibly garbage generated by faulty software.


Really interesting subject since its one most people don't pay much attention to. Here is a relevant article which explains how to create the smallest possible PNG: http://garethrees.org/2007/11/14/pngcrush/


And, I think of a nasty thing to do using this information...

One can use this sRGB color profile and jam it full of 'complex color information' and know that most sites won't do much with it. Instead, I'd bunch in there maybe 3-4 MB of information (or under size cap) and link the hell out of it.

Who'd expect an avatar image to DOS a site?


> Who'd expect an avatar image to DOS a site?

26.media.tumblr.com is an Akamai server, and I suspect it takes more than large files to bring them down.

Yes, you can wreak all sorts of havoc doing this, but I doubt you can actually take down Tumblr in any meaningful way, so it's not really a DOS attack.


my favourite observed abuse of image metadata was an exploit for some forum, in which the (iirc) EXIF chunk contained a bunch of php exploit payload code, and the image itself was uploaded as part of the attack. The forum checked the image for validity, but of course it was. The actual flaw required getting the site to consider the image as a script file, at which point it ignored everything before the <?php and poof, owned.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: