Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Stash, a graph-based cache for Node.js and Redis
56 points by nkohari on March 3, 2012 | hide | past | favorite | 8 comments
Last night, while trying to figure out the best way to implement caching in my app, I had an idea for a dirt-simple caching system based on a dependency graph.

The premise I started with is that one of the hardest things to manage in a cache is dependencies between entities. In order to cache items effectively, you inevitably have to duplicate "child" data inside of "parent" entries. Then, when a child is changed, you have to invalidate the child and its parents, and its parents' parents, and so on.

To try to help this, I hacked together a simple Node.js library called Stash, which models the cached values as a graph. When you invalidate an item, Stash will walk the graph and invalidate any items that depend on the item that was marked as invalid.

The code is available here:

https://github.com/nkohari/stash

I'm not suggesting that this is by any means a revolutionary idea; it just started as a mental exercise and now I'm wondering if there's any value to continuing to improve it as a library.

I'm also interested in what you find to be the most difficult part about caching, and how Stash could be improved to help.

Any feedback is appreciated, but bear in mind this is just a few hours worth of work and it has quite a few rough edges. Thanks!




My sense is that you are on to something. The reason i say that is that a decades-old technique (though by no means primitive or out-dated) used in operating system caches (and more recently in web apps, but far less often) is a Bayesian filter. The motivation was not explicitly connected to resolving dependencies (afaik) but rather the more general problem of how to exploit prior state (last resource requested by client) to predict future, or next state (the likelihood that a given resource will be requested next). Of course, this technique can be implemented as a directed acyclic graph, or "Bayesian Network". in any event, i like your idea, and i'm not aware of the same technique having been applied to the same problem, but my knowledge of server-side web dev is below the mean.


This problem has been bouncing around in my head since I first started using memcache. Figuring out how to easily and correctly invalidate cache items is an interesting problem.

Perhaps instead of explicitly defining dependencies, there could be some way of modeling your cached data as some kind of set of objects which would then understand their own graph. As you can see I haven't quite figured out how it would work but this is how I've been thinking about it lately. Like a simple ORM, in some way.

Something like the following terrible psuedocode:

  cached_comment(key) = new cached_item(function(key){ query("select * from comments where..")});

  cached_post(key) = new cached_item(cached_comment, function(key){query("...")});


This was actually my original plan, but I took the easy way out for now. :)

We use MongoDB as our data store, and between the UUIDs used as primary document keys and arrays of UUIDs representing links between documents, we could (reasonably) easily determine the dependencies automatically.


I did some research on this a couple of years ago - the aim was to come up with a mechanism that extended HTTP so any existing caching infrastructure could pick it up and implement it. I wrote up a summary of the research here: http://restafari.blogspot.com/2010/04/link-header-based-inva...

Since then Mark Nottingham and myself have written this up as an internet draft:

http://tools.ietf.org/html/draft-nottingham-linked-cache-inv...


Very cool. We've just started implementing OLAP cube type caching using sorted sets in Redis and the speed gains are impressive. Checking this out now.


I am unfamiliar with the general problem you are solving for, but I am interested in graph dependencies in general. May I ask, How many nodes are in a typical graph? Is there a possibility of cycles?

Thanks


In this case, there would be one node for each cacheable item. In this case, it's likely one node for each cacheable record in your data store, plus one node for each cacheable collection.

Cycles are reasonably possible, if two cached items depend upon each other. For example, if you store the text of the last comment a user made on HN inside the "user" cache entry, and the user edited their most-recent comment, you would have to invalidate not only the comment but also the user.

Stash treats dependencies as a DAG (directed acyclic graph). During traversal when a node is invalidated, it's aware of the potential of cycles and won't backtrack over paths its already examined.


Where can an example for how to use such cacheing in when serving pages using node be found? Sorry, first time I am hearing about this...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: