Show HN: Stash, a graph-based cache for Node.js and Redis

doug1001 · on March 3, 2012

My sense is that you are on to something. The reason i say that is that a decades-old technique (though by no means primitive or out-dated) used in operating system caches (and more recently in web apps, but far less often) is a Bayesian filter. The motivation was not explicitly connected to resolving dependencies (afaik) but rather the more general problem of how to exploit prior state (last resource requested by client) to predict future, or next state (the likelihood that a given resource will be requested next). Of course, this technique can be implemented as a directed acyclic graph, or "Bayesian Network". in any event, i like your idea, and i'm not aware of the same technique having been applied to the same problem, but my knowledge of server-side web dev is below the mean.

tlack · on March 4, 2012

This problem has been bouncing around in my head since I first started using memcache. Figuring out how to easily and correctly invalidate cache items is an interesting problem.

Perhaps instead of explicitly defining dependencies, there could be some way of modeling your cached data as some kind of set of objects which would then understand their own graph. As you can see I haven't quite figured out how it would work but this is how I've been thinking about it lately. Like a simple ORM, in some way.

Something like the following terrible psuedocode:

  cached_comment(key) = new cached_item(function(key){ query("select * from comments where..")});

  cached_post(key) = new cached_item(cached_comment, function(key){query("...")});

nkohari · on March 4, 2012

This was actually my original plan, but I took the easy way out for now. :)

We use MongoDB as our data store, and between the UUIDs used as primary document keys and arrays of UUIDs representing links between documents, we could (reasonably) easily determine the dependencies automatically.

AffableSpatula · on March 5, 2012

I did some research on this a couple of years ago - the aim was to come up with a mechanism that extended HTTP so any existing caching infrastructure could pick it up and implement it. I wrote up a summary of the research here: http://restafari.blogspot.com/2010/04/link-header-based-inva...

Since then Mark Nottingham and myself have written this up as an internet draft:

http://tools.ietf.org/html/draft-nottingham-linked-cache-inv...

gsiener · on March 3, 2012

Very cool. We've just started implementing OLAP cube type caching using sorted sets in Redis and the speed gains are impressive. Checking this out now.

klahnakoski · on March 4, 2012

I am unfamiliar with the general problem you are solving for, but I am interested in graph dependencies in general. May I ask, How many nodes are in a typical graph? Is there a possibility of cycles?

Thanks

nkohari · on March 4, 2012

In this case, there would be one node for each cacheable item. In this case, it's likely one node for each cacheable record in your data store, plus one node for each cacheable collection.

Cycles are reasonably possible, if two cached items depend upon each other. For example, if you store the text of the last comment a user made on HN inside the "user" cache entry, and the user edited their most-recent comment, you would have to invalidate not only the comment but also the user.

Stash treats dependencies as a DAG (directed acyclic graph). During traversal when a node is invalidated, it's aware of the potential of cycles and won't backtrack over paths its already examined.

villagefool · on March 4, 2012

Where can an example for how to use such cacheing in when serving pages using node be found? Sorry, first time I am hearing about this...