Snappy Dashboards with Redis

guywithabike · on Oct 16, 2012

This doesn't make any sense to me. Why store the values in Redis and use Ruby to do simple math? If you're using Postgres already, just store the stats in Postgres and use an aggregate or window function to grab the stats. (And collect those stats with triggers, in the first place.) If you're using Mongo already, just grab your stats with a map reduce query.

meritt · on Oct 16, 2012

Because they use MongoDB and as the usage of it increases, we'll see more and more people continue to "discover" ways to do things RDBMS' solved in the 80s.

Just like this also has nothing intrinsically to do with Redis and could be any regular ole key-value store.

samstokes · on Oct 16, 2012

It looks like one of the things they're counting is clicks, so they could potentially have some pretty large datasets.

I don't know how well Mongo's map-reduce works, but in Postgres, COUNT(star) [1] does not perform well for very large tables (e.g. 100 million rows). You wouldn't want to be doing a COUNT(star) once per minute for each customer that had their dashboard open on a plasma screen.

Of course, there are other solutions to that problem: generate the counts on some more feasible schedule and cache them; have a read replica used for analytics queries; shard by customer and have no large customers.

I don't know whether their scale strictly requires the Redis solution, but in any case there are situations where it's not as simple as "throw it in Postgres and use an aggregate function".

[1] "star" instead of an asterisk to avoid HN thinking I'm trying to write in italics.

pestaa · on Oct 16, 2012

Note that recent versions of Postgres can now COUNT against indices so no need to do a full table scan.

samstokes · on Oct 17, 2012

This certainly helps, but if your indices are several GB in size, even an index scan is a nontrivial expense.

avand · on Oct 16, 2012

To be fair, we may have over-engineered this solution. I've been doing some work with compound indexes with Mongo and they seem to be performing really well. Maybe I'll have to write a guest post for MongoHQ too ;)

Appreciate the time and thoughtful response.

mbell · on Oct 16, 2012

The article said they could rebuild the count if they needed to. So something in regards to each click is being stored. If you were using postgres you'd just setup a trigger on that table to increment the click values (stored in another table) as appropriate. No aggregate function needed.

samstokes · on Oct 16, 2012

Certainly you can implement this same pattern without Redis. Triggers in Postgres would be a reasonable way to do it. I didn't say you can't do this in Postgres, I said you can't do it with COUNT().

It does indeed sound like they're storing every click: that's precisely why using aggregate functions would be expensive.

guywithabike · on Oct 16, 2012

You absolutely can do it without COUNT. Just increment a counter value, the same way they're doing it in Redis.

pestaa · on Oct 16, 2012

Why was this downvoted? Adding 1 to any number field is just as an atomic operation as incrementation is in Redis.

samstokes · on Oct 17, 2012

Presumably it was downvoted (not by me) because it's refuting a claim I didn't make:

samstokes: I said you can't do it with COUNT()

guywithabike: You absolutely can do it without COUNT

Then we agree.

mbell · on Oct 16, 2012

Agreed, this seems to fall into the bucket of "rediscovering" something that was solved decades ago.

stephen_mcd · on Oct 16, 2012

It's the blog of a hosted Redis service.

Make sense?

mikkelewis · on Oct 16, 2012

I completely agree, I worry how everyone is introducing all these new technologies into their stack when their existing tools work fine.

Introducing new services/technologies just complicates things on both the application and ops layer (even though the author did say he was using RedisToGo).

avand · on Oct 16, 2012

We definitely could have done this in Postgres.

Our API is a Rails project and though we do a lot of work that's "off the rails," so to speak, we do try to follow a convention where possible. A convention that's worked well for us is to use Postgres only for database-backed models in the app. So when it came time to solve this problem, Redis seemed like a good choice.

Map/reduce isn't an option here. First, it's too slow. Second, it's asynchronous by nature, which doesn't work well when trying to load a webpage. We did consider using it in the background, however, to generate the counter-caches.

Thanks for reading!

gizzlon · on Oct 16, 2012

This makes a lot of sense to me. Why bother your main DB with a lot of simple "queries" like these when:

1) They're so easy to move out to something else

2) You main DB probably runs on very expensive hardware, unneccessary expensive for things like this. This probably does not need that expensive san backing it. Not all data is created equal.

3) Redis does this faster

4) Redis is simpler

Of course there are some advantages to storing everything in the same DB, but to me, this seems like a good example of "use the right tool for the job". But, I haven't actually tried this in Redis, so what do I know..

jeromeparadis · on Oct 16, 2012

I suppose it really depends on the use case. For example, there aren't probably any tables corresponding to an API call. If I wanted to collect stats on different API calls and other stuff not related to my models, I would prefer avoiding a database count update and would rather make a single incr call to a Redis key.

Groxx · on Oct 16, 2012

In general, absolutely. Redis tends to have very high write IO compared to your average DB though, so that's a definite advantage when gathering very-frequent actions.

bbq · on Oct 15, 2012

This is cool and, I believe, a good strategy for many things.

> If Redis isn’t available, for whatever reason, we could rebuild the gaps from the canonical data in Mongo. We’ve never had to do this.

I'm not sure this is so straight-forward.

So, this is a guest post by Sqoot hosted by togo.io, who owns the redistogo.com. Can anyone explain redistogo.com's pricing? I understand the value in not running your own service dependencies, but the redistogo.com prices seem really high to me. I was under the impression redis is fairly easy to manage. What kinds of operational tasks does redistogo.com perform for a redis instance that would warrant such high prices?

avand · on Oct 16, 2012

I pinged the guys over at RedisToGo to get you some more clarity on the pricing.

Speaking for ourselves, we've historically preferred to have someone else manage our infrastructure. Setting up Redis is fairly trivial, I agree. However, setting up a secure machine out on the internet and making sure it's always there is less trivial. Rather than juggle an ops/engineer role, we just double down on engineering. I'm sure at some point this may need to change. Hopefully, when that day comes we can afford a sysadmin!

mbell · on Oct 15, 2012

This is a pretty standard way to handle counts and doesn't seem to have much to do with redis. In an SQL database you'd just do it with a trigger and your application wouldn't need to know a thing about it.

avand · on Oct 16, 2012

I've never used SQL triggers. We host our Postgres database with Heroku. As a result, I think I've ruled out database level solutions.

On a related note, it feels good to know that everything our app needs to run is in the code base. Back in my C# days, I remember relying on stored procedures that were configured manually at the database level. Rails fights that with migrations and the callback chain too, so I guess that thinking has sunk in for me.

Thanks for the comment.

mbell · on Oct 16, 2012

> We host our Postgres database with Heroku. As a result, I think I've ruled out database level solutions.

What does hosting with Heroku have to do with using a trigger?

> On a related note, it feels good to know that everything our app needs to run is in the code base.

Except the database schema and any additional indexes you need to make it not perform terribly. All basic setup, just like creating triggers.

> Rails fights that with migrations and the callback chain too, so I guess that thinking has sunk in for me.

The problem with doing stuff like this in a callback is that an additional query is sent to the database which can be a big performance problem if the insert load is high. I generally agree that complex logic should be avoided in triggers and is better left in the application code in most cases but incrementing a counter is about as simplistic as you can get.

avand · on Oct 16, 2012

Well thanks for the insights. I'll keep triggers in mind when I come across a problem like this in the future.

pvh · on Oct 16, 2012

For the record, Heroku doesn't affect your ability to use triggers. It's possible our ancient "shared database" infrastructure simply didn't support it, but the new starter tier plans certainly do.

kayoone · on Oct 16, 2012

Sounds like the typical caching approach to me, "avoid making big queries by caching your data in a ram-based key-value store" .. so this is pretty much common knowledge amongst webdevs since LiveJournal introduced memcached back in 2007?, or am i missing something ?

taf2 · on Oct 15, 2012

This works great, one thing I've started doing is caching those mget's in memcached. Redis is fast, but it's also single threaded and depending on how you're using it can become cpu bound, causing timeout errors while busy redis instance handle lots of write/reads... so similar to mysql - i've started guarding multiple redis reads with single memcache get... feels crazy, but maybe correct?

papercruncher · on Oct 16, 2012

I'm surprised that Redis cannot handle your traffic. What are you throwing at it? Or are you on EC2, with frequent BGSAVEs or AOF enabled on an EBS volume?

taf2 · on Oct 16, 2012

rackspacecloud - no AOF only BGSAVEs currently my save frequency is:

  save 1900 1
  save 1300 10
  save 160 10000

Perhaps there's a better way for me to tune this? Most of the data I store in redis is temporal so I don't mind losing it, but I do store stats for usage of features for reporting in my admin dashboard similar to how this article describes and that stuff i would like to keep around but if i lost a few hours or even a day i'm not going to lose sleep.

I should add, I also use redis to handle some pretty large calculations... zrange's for distance calculations intersections and so on... I did some benchmarking and found that at least in ruby (1.9.3), this was more efficient to load up redis for the sets and sorting, fewer GC hits and faster sorting intersecting... I'm thinking it might be good if for these one off frequent compute tasks, I run on a redis instance that has no save, especially considering I expire/delete the keys immediately after running my calculations.

plasma · on Oct 16, 2012

Tried running the redis benchmark app to see how your figures compare to other published stats?

firefoxman1 · on Oct 16, 2012

I've been working on a dashboard that uses Redis as well. Just wondering, why not take advantage of sets/lists/zsets for your date-related keys. With lists you can do an easy LRANGE instead of that loop of GETs you're doing now.

Also, if you don't do this already, look into using bitsets to track users. As long as userID's are integers, it's real easy it saves a lot of space.

makmanalp · on Oct 16, 2012

Nothing specific about redis or new here. Of course if you can easily cache a pre-calculated value, that'll save you time and CPU.

jeromeparadis · on Oct 16, 2012

Well, to be fair, the cached counts are incremented by Redis through the incr command. So, no pre-calculation here. It can be interesting also to aggregate metrics of floats using the incrbyfloat command.

zerop · on Oct 16, 2012

Why not store & update count in database. Rather Redis. Also I could never appreciate the redistogo. I can get much powerful redis instance on EC2 for much lesser price.

codenerdz · on Oct 16, 2012

nitpick: "The values are almost inconsequential since THEIR just numbers" should be either THEY ARE or THEY'RE

avand · on Oct 16, 2012

Good god. I can't believe I missed that. Thanks. I'll make sure that gets fixed.

stavros · on Oct 16, 2012

This is exactly what I'm building http://www.instahero.com to solve. You don't have to build your own infrastructure, just write the relevant bit of code (or select a template) and you have a dashboard.

avand · on Oct 16, 2012

Thanks for sharing. I agree with @bunkat here (though I didn't read the whole page). I find the language around analytics apps (including big boys like Mixpanel) to be generally vague. If you're catering to a primarily developer audience, you may consider just showing me how easy it is. StatHat does a good job of this.

stavros · on Oct 16, 2012

Very true, thanks. Here's a sample: http://www.instahero.com/blog/2012/10/11/using-instahero-gai...

bunkat · on Oct 16, 2012

I think your value prop could be honed a bit. I tend to read more on a site than most people will, but I still couldn't get through the wall of text without losing interest and leaving. An example of 'write this code and get this!' would have kept me around.

stavros · on Oct 16, 2012

That sales letter converts better than the actual page, but you're right. Here's some sample code: http://www.instahero.com/blog/2012/10/11/using-instahero-gai...

klinquist · on Oct 16, 2012

I use Fnordmetric - Ruby/Redis dashboard. Works great.

deviarte · on Oct 17, 2012

Please always include a link: https://github.com/paulasmuth/fnordmetric

avand · on Oct 16, 2012

That's slick.