An analysis of Facebook photo caching

gresrun · on May 4, 2014

Most interesting is a new caching algorithm that outperforms LFU and LRU:

S4LRU: Quadruply-segmented LRU. Four queues are maintained at levels 0 to 3. On a cache miss, the item is inserted at the head of queue 0. On a cache hit, the item is moved to the head of the next higher queue (items in queue 3 move to the head of queue 3). Each queue is allocated 1/4 of the total cache size and items are evicted from the tail of a queue to the head of the next lower queue to maintain the size invariants. Items evicted from queue 0 are evicted from the cache.

Paper: http://www.cs.cornell.edu/~qhuang/papers/sosp_fbanalysis.pdf

twic · on May 4, 2014

One way to think about this is that there is a single queue, but that new items are added three-quarters of the way in. When an item is accessed, it is promoted to the quartile point two back from its current position.

I think that makes it clearer that it is essentially a less aggressive LRU; being less aggressive shifts its focus to a longer timescale.

It also lets you think about other ways to tweak LRU; what if instead of promoting an item to a particular quarter point, you just moved it up N places in the list? If you set N to three eighths of cache size, that would be roughly equivalent to S4LRU, but with a smoother response. You could then think about varying N according to where the item starts in the list (making it tougher to climb to the very top, say), or according to some other property of the item (making it easier for small items to climb, say).

Another way to think about this is that it is 4Q-LRU, but with items that fall off the higher queues being demoted to lower queues, rather than being evicted:

http://www.vldb.org/conf/1994/P439.PDF

It'd be interesting to see a comparison between SnLRU, nQ-LRU, and ARC. The comparisons to FIFO, LFU, and LRU in the paper really only serve to establish that SnLRU is interesting, not that it is a great algorithm.

Terretta · on May 4, 2014

> Most interesting is a new caching algorithm

S4LRU (or 4QLRU mentioned in the sibling comment) is not new.

The concept is one of several in a bag of tricks the Advection video delivery network used to manage HD video edge caches since early to mid 2000s. Not discussed in this paper, we also stored the content of each of our S4LRU-like queues in storage levels each capable of higher concurrency: origin storage cluster, local NAS, host JBOD HDD, host SSD, host ramdisk. We didn't use LRU, but something that works notably better for video, still not found in papers.

HD video files are very large compared to available storage for cache queues, and moving HD content among layers is expensive and slow. LRU causes a lot of churn and efficiently distributing large content across "collaborative" collections of edge servers with comparatively smaller caches presents a tough packing problem.

As a self-funded video delivery network bootstrapping off our own organic revenue, we had to solve these in order to wholesale to CDNs private labeling and reselling our video delivery at huge scale. Our combination of approaches let us beat all published papers on cache efficiency (as of mid 2000s), using a set of computationally inexpensive heuristics.

It's been cool to watch CloudFlare independently discover and publish ideas from that bag of tricks, while the traditional CDNs continue to operate less efficiently. I was surprised by this paper's discovery that FB is still not using what this paper calls "collaborative" edge caching. One day the big guys may wake up and find themselves no longer competitive.

jon-wood · on May 4, 2014

That is interesting. Just to check I've understood this correctly are they using a generational garbage collector for their caches? That's the sort of idea which once written down seems pretty obvious, but until now hasn't occurred to me.

wheaties · on May 4, 2014

Ttheir origin cache handled but 5% of all traffic and did very little to help the system. The big takeaway should be to understand the usefullness of every level, both in terms of added cost in maintaining, complexity and equipment so that you know where to invest your energy and what sorts of gains you cab have from that investment. Very nice write up.

taspeotis · on May 4, 2014

Previous discussion https://news.ycombinator.com/item?id=7320885

canadev · on May 4, 2014

Why is it that an infinite sized cache has < 100% hit ratio?

lclarkmichalek · on May 4, 2014

A hit is defined as the event of "the thing I requested was in the cache". Assuming you have already put as many items as you can in your cache, with an infinite sized cache, everything will fit in the cache, so you will never have an occurrence of the event "the thing I requested was not in the cache", giving you a P(hit) of 1.

Of course, this ignores cache invalidation, population and I'm sure some other fun problems Facebook get to deal with.

joosters · on May 4, 2014

Presumably they are imaging an infinite sized cache but it starts off empty. In their test trace, only 85% of the requests are for files that are ever accessed more than once, so the other 15% can never be a cache hit.

tbagman · on May 4, 2014

Because of compulsory misses -- i.e., the first request to an object is never in the cache.

sbierwagen · on May 4, 2014

Interesting that they spend so much time thinking about photo caching, considering that they recompress them down to quality 70 JPEGs, making them not worth looking at in the first place. Everybody involved with that decision at Facebook should be prosecuted for crimes against image quality.

gcr · on May 4, 2014

I'm not sure. I'd rather have photos show up quickly (eg. flipping through a slideshow) and then be able to get better quality.

Has anyone looked into server-side support for progressive JPEG images? Eg. if a browser is downloading 30 images in parallel, then the server will only send enough data to display each image coarsely before it finishes sending the entire image file?

Has anyone made an image viewer that treats flipping through the image collection differently than viewing a single image? Eg. for many online image viewers (most "gallery" plugins), if you click "NEXT NEXT NEXT" quickly, the browser eventually just shows the spinning throbber and not the image because it's busy downloading the high-quality version of the image several clicks back and i have to wait to catch up. I'd much rather have the browser stop the download when I click NEXT and immediately scale up the thumbnail that it already has; that way, at least I know what the image is going to be about.

Or: Couldn't we distribute thumbnails for a 30,000-image gallery as 30,000 consecutive frames in a .webm or H264 video? With special sorting techniques, I'm sure we could take advantage of a bunch of shared image structure that way.