For a quick test, I run it over ~1.3GB JPEG pictures I had locally, the finally result is 810M, that's 66% of the original size, very impressive considering it's lossless. It only deals with jpg file though, no png, no iso, no zip, no any formats other than JPG.
If someone can do this over video files that will be PiedPiper comes into real life.
The reason this can be done with JPEGs is the compression hasn't been updated. There are folks that have used h264 for compressing images and webp uses vp8 compression, both with much better results than JPEG.
Lepton is cool because it helps make existing technology a whole lot better, but what we actually need is a better image format.
You wouldn't see the same leap for videos because people have been working hard to make those great for a while while JPEG has been left to rot.
All that is true, but it misses out a key point: Lepton is risk-free.
I could re-encode all of my JPEG photos with a better codec, but the problem is that then they have gone through a lossy compression scheme twice: once with JPEG, and once with the new codec. This could ruin the image quality on some photos, giving them nasty artefacts.
The newer codecs might even be hindered further by the fact that they are being asked to compress an image that's been JPEG'd. I don't think any of the codecs you mentioned are tuned to work their best with image data that already has been butchered by JPEG. They expect to be given the 'pure' original image. This may well cause the codec to perform worse than expected.
Using Lepton gives none of these risks, since jpeg<->lepton is lossless. I could throw away all the JPEGs afterwards, safe in the knowledge that if I want to go back, I can.
Also, it's not really true that JPEG has been left to rot. There are lots of programs available that greatly improve upon the basic JPEG compression, while still outputting a JPEG file that any compliant decoder can read. And there's an even simpler way to shrink your photo file sizes - just drop the JPEG quality level slightly. Most cameras and programs default to a very high quality value. Dropping the default by even a tiny amount can produce remarkably smaller files, and you'll probably never notice the miniscule image quality loss.
I suspect that JPEG is so popular because it is 'good enough'. Most people and programs don't care so much about squeezing out smaller files, so they'll keep using JPEG regardless.
> FLIF is a novel lossless image format which outperforms PNG, lossless WebP, lossless BPG, lossless JPEG2000, and lossless JPEG XR in terms of compression ratio.
PNG was not designed to be used for photograph-like images. The rest of those were not designed to be lossless formats, the lossless version is just a tacked-on afterthought.
Very unsurprising to find a codec that can beat those.
In my own tests, I found FLIF generally beats PNG (PNG Crush/Optipng both in brute-force mode) for comics (greyscale, majority white) as well, but by a less significant margin.
It's also worth noting that there aren't many other lossless formats, so it's still a valid comparison. I'm sure neither TIFF nor RAW outperform FLIF either.
That's not a valid comparison for TIFF. TIFF is a container which supports multiple compression algorithms. You could plug FLIF in as a compression algorithm and use it directly.
It also loads progressively and has a novel feature that lets a client determine how much detail to render, then stop loading any more data, all while using the same file.
This would be a totally awesome feature if the quality of a truncated FLIF came anywhere close to files tailored to that size in “real“ lossy formats. Unfortunately, according to that example page, truncated FLIF falls far behind. It might find a niche where data reduction and scale reduction fall together (the last example)
The lack of processing comparisons raises that question pretty loudly, and it's definitely important in a mobile world. There's more to performance than size
Indeed. In one project where I converted jpegs to webp I saw around 70% savings at similar quality. This was on Android where it's fully supported. On the web pretty much only Chrome supports it, I'm not sure why Firefox, IE and Safari are hesitant.
This [1] is from 2013, so I don't know how it holds up, but I found it in the criticism section of the Wikipedia article for webp [2].
edit: This [3] is a follow-up from 2014 to the article/study from 2014.
TL;DR: "We consider this study to be inconclusive when it comes to the question of whether WebP and/or JPEG XR outperform JPEG by any significant margin. We are not rejecting the possibility of including support for any format in this study on the basis of the study’s results. We will continue to evaluate the formats by other means and will take any feedback we receive from these results into account."
Where did you read that? Since Apple has yet to commit on H.265, ( Due to Insane patent Licensing payment ), I am hoping they jump onboard with Open Media Alliance.
It wouldn't get much smaller. The reason DVDs are such a high bitrate compared to Handbrake rips isn't that MPEG2 is inefficient, it's because there's a keyframe twice a second. Most movie rips have a keyframe every 10 seconds.
It's mostly to let you fast forward, but there is a technical issue there. MPEG2 decoders aren't all mathematically identical, so what happens is the picture tends to drift away from the real thing after a while, and there's hacks like frequent keyframes and flipping the smallest DCT coefficient to get around it…
I've always wondered if it would be possible to losslessly convert the MPEG2 DCT coefficients, motion vectors etc to the equivalent subset of h.264 and then take advantage of the better prediction and entropy encoding of the later standard.
Maybe you could even go further and actively remove keyframes (in a fully reversible way, just keep a record of where they were)
I'm not sure how much you would save, and for losslessly archiving DVDs you might be better off creating a special format like Lepton for mpeg2
The first part of your question was posed on Doom9 in 2009. It's the opinion of Dark Shikari, longtime lead x264 developer, that it would be possible [1].
Not sure what you mean by the keyframe removal, though. Such an act would be lossy, significantly impair any P-frames or B-frames (unless you majorly modify them), and, it frankly doesn't sound very reversible. Mind elaborating?
What I mean is transforming I-frames into equivalent P-frames (or even B-frames) that decode to the exact same pixels. The transformation might result in something that is bigger than a directly encoded P-frame (which can tolerate some small errors) but I suspect the new P-frame will be smaller than the original I-frame.
There is no restriction that P/B-frames only reference I-frames, so you don't even need to touch those frames.
For conversion back to MPEG2 it would be ideal to detect or mark the original I-Frames, so you can convert them back (a simpler transformation). But you could also pick any random P/B-frame and convert it to an I-frame with little issue.
Why bother trying to losslessly compress your DVDs when they're already a low quality, compressed MPEG2 source? The small amount of content that isn't available in higher quality won't take up that much space left as is.
In the case you're being facetious, maybe you could clarify how big is your video archive exactly and what sort of compression you're trying to achieve on it?
The article (and the person you're replying to) is referring to lossless compression. Converting to H264 and AAC may maintain a high quality at a lower bitrate but they're definitely not lossless.
While there's a lossless H.264, it doesn't fit the implied use-case proposed in the thread. Using lossless H.264 to recompress lossy MPEG-2 would be akin to using PNG to recompress a JPEG -- silly and wasteful.
Sure, it's an inexact analogy because PNG is not a block-based DCT coder, but all lossless H.264 does is set the quantizer as absurdly high as it needs to go to losslessly encode a particular frame. Staring from a compressed source, this is not a good recipe for achieving a space-saving result.
The JPEG file format is pretty detailed, there are lots of places in the file where you could indicate that the image has been compressed with a different algorithm. But it wouldn't really help, because older decompressors will still not be able to interpret the new codec. They would be able to parse the file and read the metadata in it - like image width and height, timestamps, camera details, and so on, but there's no way it can decompress the data.
PNG is a very nice, extensible container format. In principle you can use it to store arbitrary image data, and still make use of all established metadata fields.
In practice you probably shouldn't do that without changing the file extension and the magic bytes to avoid confusion among users and poorly written software.
While PNG (and libpng) are reasonable for their intended purpose as a GIF replacement for simple RGB and greyscale graphics, it doesn't compare when it comes to the power and flexibility of TIFF, from sample format and depth, to sample count, image orientation, tiling and many additional features.
TIFF is showing its age at this point, with a some high-end scientific applications switching to HDF5. But PNG simply isn't featureful enough for this type of data.
Thanks for testing this! For archival purposes, be sure to check the exit code of the lepton binary after compressing each JPEG: the default parameters only support a subset of JPEGs (you need to pass -allowprogressive and -memory=2048M -threadmemory=256M to support a wider variety of large or progressive JPEG files). Lepton will not write images that it is unable to compress using the settings provided. In those cases be sure to keep the original JPEG.
Hi Daniel, very cool project! I am interested in hearing your thoughts on alternative image formats / compression methods.
WebP is very promising (also based on VP8) for lossless and lossy compression. Have you considered using it to compress PNGs in the same way Lepton is compressing JPGs? Odds are it wouldn't be bit-perfect though (despite being pixel-perfect).
Also interested in hearing about the tradeoffs between server-side decoding and client-side. Not to keep focusing on it, but WebP has native support in Chrome and javascript decoders for everything else.
I think it is a very exciting time for image formats with several promising new ones on the way (WebP, FLIF, maybe BPG but possible legal issues).
I'm running over several terabytes and on the 5k images I've done so far (3.6GB), the compression has been 0.78x, which is damn good for lossless compression!
EDIT: 17GB now (my server is kinda slow) and it's still holding at 0.78x.
Yes and no. It's not a lossless way to encode an image, but the Lepton step itself doesn't add any loss that wasn't already present in the JPEG to begin with, and it generates a smaller file overall.
Someone just had to mention that _stupid_ show. Seriously. By empowering that show, you're just making fun of yourself and all the other programmers on here.
I really admire Dropox for open sourcing this, it shows their commitment.
Saving almost a quarter of space for most images stored is something that truly gives a competitive edge. (I say most because people probably primarily have JPEG images).
Especially considering how many images are probably stored on services like Dropbox.
IMO, it shows how they don't plan to make any money from this, don't consider it a large competitive advantage, and think that the largest advantage of open sourcing it are the 'free' open source volunteers will improve the software.
It's interesting that they don't consider it a large enough competitive advantage.
Either that or they are using this to attract engineers.
> Either that or they are using this to attract engineers.
On a related note (I can't speak for Dropbox specifically) there are many engineers who desire their work to be open source for their own motivations, and when it's not a significant business risk to do so, nice companies will allow it.
So it works on two fronts, as a "hey, we'll let you open source your stuff" and a "hey, we've got people here who care about and contribute to open source."
You also open source if it's in your best interest for something to become the 'standard'.
In google there is a regret that they didn't open source a lot of stuff, because it ends up being reproduced outside of google in some form. The open source companies have an advantage in hiring and overall advancement of their product by using the open source version whatever their internal version was.
You also see it in facebook's open source initiatives, like react, buck, haxe and so on.
Without taking anything away from their generous announcement, it's very much in their interest for .lep files to become a standard so that they do not have to transcode to jpeg when users are viewing files on DropBox. I'm happy they released it.
Wait, since when is Facebook pushing for Haxe? I tried looking it up, but I'm not finding much. It would be really cool (I love Haxe). But I'm guessing you only went a bit over the top with that particular example.
Then why do you admire them? Would you also admire them if you were their investor? Dropbox management is obligated by law to act in the best interests of their shareholders, i.e. to make them as much profit as possible.
It's more likely that they have released it because of some profit-seeking interest. They are not charity.
> Dropbox management is obligated by law to act in the best interests of their shareholders, i.e. to make them as much profit as possible.
Can you cite the law which you believe makes that obligation?
The reason you can't is because it's not actually a legal requirement and the reason is obvious: it's hard to say what the best interest is over all but the shortest time frame:
Dropbox's management might argue that they benefit more from open-source than they're giving away, that this kind of favorable attention will help them hire the top engineers who make far larger contributions to their bottom-line, that the pricing models are complex enough that this just doesn't matter very much, etc. Absent evidence that they're acting in bad faith, it's almost impossible to say in advance whether those arguments are right or wrong.
"Dropbox management is obligated by law to act in the best interests of their shareholders, i.e. to make them as much profit as possible." - no they are not.
Even if that was their obligation (which it's not, as others have pointed out), there's no reason to believe that keeping the algorithm proprietary would be more profitable. They're not handing a critical capability to their competitors (competing with Dropbox isn't fundamentally about cheap storage). They don't have the infrastructure to licence the algorithm (who should they licence it to? For how much? Under which terms? Those are not simple questions to answer.)
Releasing it freely allows it to work its way into browsers, allowing Dropbox to serve up the smaller images directly to users, saving money on bandwidth. It allows other users to contribute improvements. It markets Dropbox as a desirable place to work for engineers.
I admire them because they certainly considered the pros and cons of doing so, and decided that they feel secure enough in their market position and prefer to give this back to the open source community
Pretty much every modern company safes tremendous amounts of money from open source (Linux and upwards in the stack), and so the OSS community should rightly be considered a STAKEholder.
The share vs stakeholder obsession in the space of large companies and corporations represents a lot that's wrong with our current markets.
--
Also, the only upside to open sourcing this is getting other involved in development. I just tested on 10k images, and the promise both on compression rate and bit parity after decompression holds true.
Seems to be a pretty stable product, so the that motivation is probably only miniscule.
There's also the upside of attracting other talented developers to come work at Dropbox if Lepton is representative of the kind of project they might be working on.
Does it make a difference? Can't public companies do the same at shareholder meetings, or declare what the intentions of the company are before going public? For (a poor) example, Google declared at IPO they would never pay dividends.
A similar approach has also been developed for use with zpaq, a compression format which stores the decompression algorithm as bytecode in the archive:
[The configuration] "jpg_test2" by Jan Ondrus compresses JPEG images (which are already compressed) by an additional 15%. It uses a preprocessor that expands Huffman codes to whole bytes, followed by context modeling.
http://mattmahoney.net/dc/zpaqutil.html
Backblaze storage is $0.005/GB/Month = $5k/PB/Month.
The GitHub repo has 7 authors, perhaps costing Dropbox $200k/year each and taking most of a year ~ $1M to develop this system.
So this might pay for itself after 200PB*Months, assuming Dropbox's storage costs are the same as Backblaze's prices, and assuming CPU time is free. (TODO: estimate CPU costs...)
Of course, advancing the state of the art has intrinsic advantages, but again, it's interesting to look at the purely financial point.
I think you may be crunching the numbers the wrong way...
Dropbox is valued at $10b (regardless of what you think of that number, someone was willing to buy at that price). But they only have 1800 employees. If by some miracle 80% of them are engineers that is $7M per engineer. All this for a "pure software" business. No inventory, no manufacturing, they only recently started doing their own operations.
Investors would be telling them "more engineers" => "more software" => "better Dropbox". So now you have to go out and snap up engineers in Silicon Valley, which is very very hard. But you don't offer the perk of a Google/Apple line on a resume nor the compensation of e.g. Microsoft. What do you do?
Answer: you offer people the chance to work on cutting edge breakthrough technology. Not only that but it's all open source! This is a dream job for some people. They will turn down every other gig for this one.
It doesn't matter that PB/mo is chump change because an investor doesn't know that, they only know the engineering headcount. If by some miracle an investor does know this is a waste of time, then you just point to this HN post and observe how many developers are interested in this technology and how it is attracting developer mindshare that can be exploited for additional hires down the road.
I'm not saying it's rational–it's not. But I think there were strong incentives to greenlight a project like this, even if the actual cost savings were zero.
Lepton can decompress significantly faster than line-speed for typical consumer and business connections. Lepton is a fully streamable format, meaning the decompression can be applied to any file as that file is being transferred over the network. Hence, streaming overlaps the computational work of the decompression with the file transfer itself, hiding latency from the user.
In my humble opinion, dropbox's image gallery webapp is considerably faster than any other I've seen, especially when compared to imgur.
> I don't think they're compressing this and decompressing it client side.
The speed quotes made it sound like client-side was a concern. Why would you go to all the effort of devising a new image compression format saving 20%+ storage and on the wire, and not have it decompressed client-side, especially when you control the client?
My rough understand, as someone who works at Dropbox and knows some of the people who worked on this (but isn't directly involved), is that this currently only runs on our servers. The perf requirements are primarily that we don't want to slow down syncing / downloads significantly - and also want to keep the CPU cost under control. As is, the savings in storage space should easily pay for the extra compute power required.
The typical encoding/decoding rates are measured with a Xeon... my guess is that lower powered phones/netbooks will be considerably slower. I'd trade off a 20% longer sync time in exchange for not running my laptop battery down.
They will once Silicon Valley takes them to task for it. Just like season 1 pretty much wiped "making the changes world a better place" from the lingo of SV companies.
> To encode an AC coefficient, first Lepton writes how long that coefficient is in binary representation, by using unary. [...] Next Lepton writes a 1 if the coefficient is positive or 0 if it is negative. Finally, Lepton writes the the absolute value of the coefficient in standard binary. Lepton saves a bit of space by omitting the leading 1, since any number greater than zero doesn’t start with zero.
The wording almost implies that this is novel, but it is actually Gamma coding [1], which in the signal compression community is often called Exp-Golomb coding [2]. I wonder why this is not acknowledged, considering that they mention the VP8 arithcoder instead.
So, am I to read this as to mean that when I send Dropbox a JPEG, they are behind the scenes further compressing it using Lepton? Then when I request it back, they are re-converting it back to JPEG?
That's the idea behind the algorithm, yes. And since it's lossless, every original bit is preserved. The same idea could be applied on the Desktop client instead of on the server, which would save 22% of the bandwidth as well and make syncing faster.
This is amazing, we've been struggling with JPG storage and fast delivery at my lab (terabytes and petabytes of microscopy images). We'll be running tests and giving this a shot!
jpg is a weird format to be storing microscopy images, no? Usually end up in some sort of bitmap TIFF (or their Zeiss/etc. proprietary format) from what I've seen.
So the weird thing is that we're not only a lab, but also a web tool. We have the files backed up in a standard format in one place, but delivering 1024x1024x128 cubes of images over the internet has been tricky. We don't need people to always view them at full fidelity, just good enough.
We tried JPEG2000, which was better quality per a file size, but the web worker decoder was slower than the JPEG one adding seconds to the total download/decode time.
EDIT: We're currently doing 256x256x256 (equivalent to a 4k image) on eyewire.org. We're speeding things up to handle bigger 3D images.
EDIT2: If you check out Eyewire right now, you might notice some slowdown when you load cubes, that's because we're decoding on the main thread. We'll be changing that up next week.
The size difference between "high quality" jpeg and lossless compressed TIFFs can be quite significant - so I it might not be feasible to use eg. TIFF for archiving. And jpegs might very well be "good enough".
Eg. just tested on a small 10MP Sony ARW raw image - the raw file is 7MB, the camera jpeg is 2.3MB, and a tiff compressed with LZW is 20MB (uncompressed 29MB). The raw tiff run through lzma is 9.3MB. But either way, if ~7MB is the likely lossless size, if the JPEG is good enough, at ~2.3MB it's a pretty big difference, if we're talking petabytes, not megabytes.
(I'll get around to testing lepton on the jpegs shortly)
DNGs and RAWs aren't generally (AFAIK) uncompressed. But ideally they're losslessly compressed. They're all(?) "TIFF files" - but AFAIK saying something is a valid TIFF, is almost as helpful as saying something is "a file".
Apparently Sony uses some kind of lossy compression for it's files - I just tested with a jpeg2000 encoder on the same file above, and the size of the j2k file is approximately the same as the ARW: 7MB. Btw, the lep-file is 1.7MB.
Note that the uncompressed (flat) PPM file is 29MB as is the uncompressed TIFF - but simply running the TIFF through lzma reduces the size to 9.3MB. So ~7MB isn't that far off.
[ed: And while the lep-file was 1.7MB, shaving a bit off the original jpg, mozjpeg with defaults+baseline created a jpeg (at q=75, per default) 472k in size. Lep managed to shave a bit off that too - ending up with PPM->mozjpeg->lepton resulting in a 359K file. The (standard) progressive mozjpeg ended up at 464K.
This is not quite apples to apples, though, I think the comparable quality setting for mozjpeg would probably be 90 to 95 or so -- ending up around 1.6MB. But for this particular (rather crappy) image - I couldn't readily tell any difference.
Raw files store only one channel per pixel. A TIFF has been demosaiced and stores 3 channels per pixel.
On top of that, most sensible raw formats only store 12 or 14 bits per pixel, instead of 16.
And then most are compressed, some losslessly and some lossily (like the infamous Sony format that packs it down to an average of 8 bits per pixel but does exhibit artifacts).
Some TIFF-derived container is the most common representation. But note even these could use JPEG/J2K if desired.
Most microscopy images are stored uncompressed or with lossless compression. But unfortunately this doesn't scale with newer imaging modalities. Here's two examples:
Digital histopathology. Whole-slide scanners can create huge images e.g. 200000x200000 and larger. These are stored using e.g. JPEG or J2K in a tiled BigTIFF container, with multiple resolution levels. Or JPEG-XR. When each image is multiple gigabytes, lossless compression doesn't scale.
SPIM involves imaging a 3D volume by rotating the sample and imaging it from multiple angles and directions. The raw data can be multiple terabytes per image and is both sparse and full of redundant information. The viewable post-processed 3D image volume is vastly smaller, but also still sparse.
For more standard images such as confocal or brightfield or epifluorescence CCD, lossless storage is certainly the norm. You don't really want to perform precise quantitative measurements with poor quality data.
We're storing them as RAWs for our backups, but it's cheaper and faster to store them compressed and transmit them without a transformation step. I'll keep FLIF in mind, but it looks kind of unstable reading the website?
Yes, the FLIF format is still being fully fleshed out. I would actually give WebP a shot instead of JPG. It offers better compression and already has native implementations in Chrome and on Android.
This looks very useful for archiving, but as others have pointed out, less useful for web development. I did lots of research on image compression for a book recently and found quite a few helpful tools.
jpeg-archive [^1] is designed for long term storage and you can still serve the images over the web. imageflow [^2] has just been kickstarted and looks really promising for use with ASP.NET Core.
mozjpeg is also showing progress and if FLIF takes off then that will be great. Scalable images would be fantastic. No more resizing and all the security issues that brings [^3].
The technical rigor in these recent Dropbox blog posts is admirable. Seems like an impressively talented engineering team (or maybe just good at marketing :)
I'm interested in a 'super lossy' (deep learning based?) compression. You should be able to compress movies down to a screenplay and a few stage directions.
I started something like this long ago but didn't get very far with it. This was before "deep learning" and I was groping in the dark, but I think the concept is sound up to a point.
The idea was to train a neural net and build up a database of features (maybe on the order of 1-10 GB, or whatever is just small enough to ship) to estimate the missing details from downscaled and extremely over-compressed JPEGs. If it worked, I think it would also improve the quality of all the 10-20 year old images out there where the uncompressed source is long gone. Sort of a Blade Runner-style "enhance" tool, but of course it would only be filling in aesthetically plausible details.
Google Photos already has an incredibly lossy compression, right? Images organized under Dog, Cat, etc. :) Might take a bit more effort to get the extra 999 words.
Sounds a bit similar to the fractal compression experiments[0][1] that were eventually repurposed in stuff like Perfect Resize. IIRC, it worked a bit like RLE, but by generating an IFS on the fly that could be used to recreate the image, instead of just storing pixel sequences.
The reasons given for Rust as stated in https://blogs.dropbox.com/tech/2016/06/lossless-compression-... would seem valid here too:
> For Dropbox, any decompressor must exhibit three properties:
>
> 1. it must be safe and secure, even against bytes crafted by modified or hostile clients,
> 2. it must be deterministic—the same bytes must result in the same output,
> 3. it must be fast.
I really do wish that Rust would provide nice alignment guarantees (eg 32 byte) without depending on customizing the allocator, and builtin, safe, SIMD instructions
> I really do wish that Rust would provide nice alignment guarantees (eg 32 byte) without depending on customizing the allocator, and builtin, safe, SIMD instructions
Same could be said about C/C++. I'm guessing the answer is much simpler: the author(s) are comfortable with C++. And they probably don't deploy much rust code yet.
Hmm, have to try it in WMS/TMS tile storage for web cartography. It uses JPEG files also but with fixed sizes like 256x256. May be it needs to tune predictor for that, because aerial imagery is a bit different from smartphone photos.
Would someone be kind enough to explain how one could store the compressed .lep files, serve them, and then have the browser render them, without using a JavaScript library?
You would either need a JavaScript library (which doesn't exist yet) or you would need to convince browser vendors to support lepton files natively.
In Dropbox's case they control both client and server and can just compress/decompress in the dropbox client. Using lepton on websites wasn't really Dropbox's goal (but it would be cool if a sufficiently fast JavaScript library existed).
I wonder if they use a rolling checksum too, to avoid duplicating a complete file if only a view bytes shifted (for example adding a line of text in the beginning of a file)
It probably wouldn't hit the most important cases either, dedup is typically most powerful & valuable on large media files, software packages, disk ISO's, and the like which do not frequently have arbitrary text inserted at the start of the file!
rANS is really cool, but I think we tried it too early in the lepton research phase while some of the other ideas were still brewing...
We might revisit rANS now that v1.0 is out!
Interesting that they are using VP8 to compress the JPEGs. This is one degree of separation away from Google's WebP [1]. It would be interesting to see how they stack up (WebP has a lossy and lossless mode).
(EDIT: removed wording that Lepton produces files that conform to the JPEG spec. It doesn't. It losslessly compresses into a custom format, that losslessly decompresses into a JPEG)
Lepton uses the arithmetic coder [1] from VP8. Using arithmetic coding instead of Huffman encoding to get better compression was always an option in JPEG, but it has been historically avoided due to patents [2].
Compared to VP8-Intra, the compression used in lossy WebP, JPEG is missing the prediction step, usually called 'filtering' [3], which is the single largest contributor of WebP's compression outperforming JPEG.
Reading through the Lepton blog post, it seems they're using a different method of prediction, based on observations about typical gradients and correlations between AC and DC coefficients. VP8 uses a more 'traditional' approach of predicting your neighboring pixels, which was borne out of run-length encoding, but also very applicable to video's moving macroblocks. A comparison would indeed be enlightening.
Just a random idea: could machine learning algorithms that do object recognition help to improve the compression of images or videos? Maybe a lossy algorithm could compress away "irrelevant" things. This way a high resolution frame might have lower resolution objects inside but it would he ok because the important part of the content is preserved.
Absolutely, yes. There is a very deep link between compression and "understanding". I think we have every reason to believe that networks that can understand/"explain away" the content/statistics of a scene ought to be able to compress them better.
I presume someone (or likely many people) are working on exactly this.
Very cool! if anyone who comes across this particular thread knows about papers/research being written about this topic, I'd be very interested to learn more.
There is an award-winning compressor that uses many statistical models (and a 3-layered dense neural network) to compress the data losslessly: http://www.byronknoll.com/cmix.html
Overall I think we are yet to see the full potential of deep learning unleashed on data compression. For example the neural network in cmix compressor is quite primitive compared to modern architectures. Someone will certainly find a way to do better than that!
Oh what I was imagining was more like if I was watching a news broadcast and so all I care about was the news anchor person, and the main on screen grapic/text. If you Compress down other things like the table or the background or wrote an algorithm to selectively stream resolutions tied to the object streamed instead of the frame streamed. Is that a viable form of "compression" using existing computer vision tech?
Autoencoders are a long known approach to use neural networks for lossy compression. Some papers about using these models as aids for compression are googleable.
Are we able to take the EXE and run it on JPEGs from a Windows machine without issue? (I haven't tried it myself, but probably will using the instructions I saw on Github a moment ago and report back...it looks like it should work though):
https://github.com/dropbox/lepton/releases
However I am wondering about the two different EXEs available on the page (one has an avx prefix that I need to try and figure out). If anyone has info on that that'd be useful.
I could see this potentially being useful for schools and businesses doing quite a bit of digital scanning so I'm going to try running some tests using it and some images I think we have available somewhere.
I want this working seamlessly on the file system. That means: I see JPGs but they are lepton compressed in reality. Would be a great use case for existing file servers. How could this be (theoretically) achieved on a linux machine?
Some statistics. TL;DR Lepton indeed provides about 22% of size savings.
When running Lepton through JPEG photos downloaded from Flickr, I've got about 23.37% of size savings.
When running Lepton through JPEGs generated by mozjpeg (default settings: -quality 75, progressive) from JPEG photos download from Flickr, I've got 22.63% of size savings.
The mozjpeg output is about 3.83 times smaller than the original JPEG photo, on average.
Just because you know another thing with the name doesn't make it illegal to use that name as it has a meaning anyway. There is also a CMS called Lepton. Pretty sure this will soon be the most popular product with this name anyway.
It's not illegal, it's just dumb. Sure, you could implement your entire C++ application inside the `std` namespace, and I'm sure it'd work fine, but you /shouldn't/. If you're going to start a project, at least google the name first.
just because it messed with your own personal namespace - you know that lepton is a particle and not a thermal camera right. Just because camera marketers got to you first. The real problem is polluting using useful words to market things in the first place.
They work because while most possible data is unstructured, most real-world data is highly structured. On average over all possible inputs, any compression scheme must have 1x compression.
Consider images of slides for a presentation that are text on a flat background. If you know the value of the pixel just to the left of the current pixel, then if you guess that the current pixel will be the same, you will be right most of the time. This is obviously not true for random noise. Consider a really simple compression scheme where a pixel is stored as a single bit of a 1 if it is the same color as the previous pixel, and the color is stored as usual, but with an additional 0 bit prepended to the color. When you guess wrong, you pay a tax of 1 bit, but when you guess right you save N-1 bits where N is the number of bits per pixel.
For random noise, this will grow the input quite a bit, but for simple flat-shaded graphics it will shrink the input quite a bit.
On average, any lossless compression scheme must have >= 1x compression. Lossy compression schemes can have less (since they're discarding information), and lossless compression schemes certainly can average out to > 1x (this is trivially observable by taking any 1x-average scheme and adding a fixed-length header to the output).
It's very easy to make a JPEG file that will not compress at all. Luckily it might look like snow from a television set rather than a typical image produced by a camera.
We live in a world where it is common for blue sky to occupy a portion of the frame and green grass to occupy another portion of the frame. Since images captured of our world exhibit repetition and patterns, there are opportunities for lossless compression that focuses on serializing deviations from the patterns.
Impressive work, Daniel! Do I understand correctly that any image prediction for which the deltas are smaller in absolute value than the full JPEG/DCT coefficients would offer continued compression benefits? As in, if you could "name that tune" to predict the rest of the entire image from the first few pixels, the rest of the image would be stored for close to free (and if not, it essentially falls back to regular JPEG encoding).
If that's the case, then not only could we rely on the results of everything we've decompressed so far to use for prediction (which is like one-sided image in-painting), but we also could store a few bits of semantic information (e.g. from an image-net-based CNN, from face detection) about the content of the original image before re-compression, and use that semantic information for prediction as well via some generative model. All of this would obviously be trading computation for storage/bandwidth, but it this seems like an exciting direction to me. Again, nice work.
Hi GrantS:
That's pretty close to how it works. We always use the new prediction, even where it's worse than JPEG though (very rarely), to stay consistent.
As for having the mega-model that predicts all images better: well it turns out with the lepton model out you only lose a few tenths of a percent by training the model from scratch on each images individually. We have a test case for training a global model in the archive (it's https://github.com/dropbox/lepton/blob/master/src/lepton/tes... ) That trains the "perfect" lepton model on the current image then uses that same model to compress the image (It's not meant to be a fair test, but it gives us a best-case scenario for potential gains from a model that has been trained from a lot of images) and in this case it doesn't gain much, even in a controlled situation like the test suite.
However the idea you mention here may still be a good idea for a hypothetical model--but we haven't identified that model yet.
I think you are explaining how both lossy and lossless compression work without explaining how this does it differently.
I can't read the article due to technical constraints, but understand that e.g. JPEG has a lossy quantisation pass followed by a lossless encoding/compression pass of the result of the first stage. If they're reproducing bit-identical result to the input JPEG, it must be a (very good) optimisation of the latter stage. [How'd I do?]
JPEG has a very poor lossless encoding stage; this is well known. Several archiving tools (most notably winRAR) already do some lossless reencoding of JPEG.
> Luckily it might look like snow from a television set rather than a typical image produced by a camera.
That would actually compress rather well. A hard to compress random image would not look like TV snow (white dots, with space between them), but rather randomly colored dots that are continuous in the image.
Random link: "I am trying to design a cloth that, from the point of view of a camera, is very difficult to compress with JPG, resulting in big-size files (or leading to low image quality if file size is fixed)."
No compression algorithm works on completely random data. Fortunately (for compressors) there is in practice almost no completely random data sets people care about in the world. Almost all data is corrolated in some way. In pictures you can do a good job of predicting the colours of neighbouring pixels (they will, most of the time, be a similar colour).
In this case they are making use of the fact that JPEGs store a certain mathematical formulation of a picture. It turns out JPEG doesn't store that mathematical formulation very well, so you can squash it loselessly into a better formulation, then later turn it back into the original JPEG.
There were techniques discussed as part of the JPEG-2000 efforts where just reordering coefficients before doing the entropy coding would gain you a good deal of compression (though at the expense of the block based nature of JPEG).
It's always good to see new techniques out in the open.
jpegoptim has lossless and lossy modes. In lossless mode it preserves all pixels, but it doesn't preserve the file itself. Lossless jpegoptim is comparable to Lepton. In general, Lepton can tends to give better improvements, because it uses a different output file format. How much better depends on your input files, you should try both.
I'd say 22% for Lepton and 5% for jpegoptim, based on fading past memories of mine.
I wonder if Dropbox is going to do steganography detection before using Lepton compression, because "pixel-for-pixel identical" is obviously not the same as "byte-for-byte".
You store the delta to the estimate. If the estimate is bad (e.g. because someone designed content for maximum surprise of the estimator function), compression rate goes down, not quality.
>For the standard test image the new LenPEG 3 compresses the image so efficiently that data storage space is actually freed on the computer right up to the entire capacity of the storage devices
For a quick test, I run it over ~1.3GB JPEG pictures I had locally, the finally result is 810M, that's 66% of the original size, very impressive considering it's lossless. It only deals with jpg file though, no png, no iso, no zip, no any formats other than JPG.
If someone can do this over video files that will be PiedPiper comes into real life.