Most interesting is a new caching algorithm that outperforms LFU and LRU:
S4LRU: Quadruply-segmented LRU. Four queues are maintained at levels 0 to 3. On a cache miss, the item is inserted at the head of queue 0. On a cache hit, the item is moved to the head of the next higher queue (items in queue 3 move to the head of queue 3). Each queue is allocated 1/4 of the total cache size and items are evicted from the tail of a queue to the head of the next lower queue to maintain the size invariants. Items evicted from queue 0 are evicted from the cache.
One way to think about this is that there is a single queue, but that new items are added three-quarters of the way in. When an item is accessed, it is promoted to the quartile point two back from its current position.
I think that makes it clearer that it is essentially a less aggressive LRU; being less aggressive shifts its focus to a longer timescale.
It also lets you think about other ways to tweak LRU; what if instead of promoting an item to a particular quarter point, you just moved it up N places in the list? If you set N to three eighths of cache size, that would be roughly equivalent to S4LRU, but with a smoother response. You could then think about varying N according to where the item starts in the list (making it tougher to climb to the very top, say), or according to some other property of the item (making it easier for small items to climb, say).
Another way to think about this is that it is 4Q-LRU, but with items that fall off the higher queues being demoted to lower queues, rather than being evicted:
It'd be interesting to see a comparison between SnLRU, nQ-LRU, and ARC. The comparisons to FIFO, LFU, and LRU in the paper really only serve to establish that SnLRU is interesting, not that it is a great algorithm.
S4LRU (or 4QLRU mentioned in the sibling comment) is not new.
The concept is one of several in a bag of tricks the Advection video delivery network used to manage HD video edge caches since early to mid 2000s. Not discussed in this paper, we also stored the content of each of our S4LRU-like queues in storage levels each capable of higher concurrency: origin storage cluster, local NAS, host JBOD HDD, host SSD, host ramdisk. We didn't use LRU, but something that works notably better for video, still not found in papers.
HD video files are very large compared to available storage for cache queues, and moving HD content among layers is expensive and slow. LRU causes a lot of churn and efficiently distributing large content across "collaborative" collections of edge servers with comparatively smaller caches presents a tough packing problem.
As a self-funded video delivery network bootstrapping off our own organic revenue, we had to solve these in order to wholesale to CDNs private labeling and reselling our video delivery at huge scale. Our combination of approaches let us beat all published papers on cache efficiency (as of mid 2000s), using a set of computationally inexpensive heuristics.
It's been cool to watch CloudFlare independently discover and publish ideas from that bag of tricks, while the traditional CDNs continue to operate less efficiently. I was surprised by this paper's discovery that FB is still not using what this paper calls "collaborative" edge caching. One day the big guys may wake up and find themselves no longer competitive.
That is interesting. Just to check I've understood this correctly are they using a generational garbage collector for their caches? That's the sort of idea which once written down seems pretty obvious, but until now hasn't occurred to me.
S4LRU: Quadruply-segmented LRU. Four queues are maintained at levels 0 to 3. On a cache miss, the item is inserted at the head of queue 0. On a cache hit, the item is moved to the head of the next higher queue (items in queue 3 move to the head of queue 3). Each queue is allocated 1/4 of the total cache size and items are evicted from the tail of a queue to the head of the next lower queue to maintain the size invariants. Items evicted from queue 0 are evicted from the cache.
Paper: http://www.cs.cornell.edu/~qhuang/papers/sosp_fbanalysis.pdf