NoSQL Databases: What, Why and When

xtacy · on March 31, 2011

The slide that contrasts RDBMS and NoSQL DBs and asks "A step backwards?" reminds me of the rebuttal of MapReduce and NoSQL stores by Mike Stonebraker (copy of the article here: http://craig-henderson.blogspot.com/2009/11/dewitt-and-stone...)

joe_the_user · on March 31, 2011

Interesting article.

The author compares map-reduce to Teradata. Serious question: could a system on the scale of Google's various "apps" run on a Teradata database?

Retric · on March 31, 2011

"Google Scale" is not really all that big of a problem as long as competent people are involved. Google's approach is to basically accept that everything is O(N) and simply through enough hardware at the problem that it's not an issue. RDBMS's often let you do things as O(log N) but they risk O(N^2) or even O(N^N) so your developers need to know what they are doing.

PS: The median developer at any large company is practically incompetent, so it's probably a vary good trade off once you have data centers of that size. However, for comparison Slashdot ran on 4 fairly cheap machines for a long time.

crux_ · on March 31, 2011

Opinion: traditional databases suck and so do NoSQL ones, for the exact opposite reasons.

A better solution would be one that gives developers fine-grained control over "scalable and loose" vs "less-scalable and tight", rather than all-or-nothing in either direction.

That is to say: I want full ACID on my "financial transactions" collection and maximum scalability for my "chat messages" one -- but I want them in the same overall system, accessed via the same API, managed via the same tools, and with knobs that can be inexpensively modified on the running system.

nosh · on March 31, 2011

I think finer grained consistency models (as well as some other options) would be a step forward for RDBMS systems. However, that doesn't get over the fact that the relational model is not always the best fit for your data. This is especially true if your data is semi-structured. I work with the MongoDB/10gen team and we have a lot of users who enjoy working with the document model, and have seen drastic reductions in both amount of code and time-to-production from it.

crux_ · on March 31, 2011

Just a vocabulary/definitions nit: Data modelling (relational, document, data-structure, whatever) is an independent concern from consistency.

I may want ACID properties but over a document-structured store, for example, and there's nothing contradictory in such a desire...

In fact, as a continuation of my "mix and match" consistency pipe dream; I'd like to mix-and-match data models within the same system, too! Strongly typed w/ integrity constraints; bag-o-data; etc...

gaius · on March 31, 2011

But you can do that. SET TRANSACTION ISOLATION LEVEL SERIALIZABLE (or REPEATABLE READ) in the sessions doing your financial transactions, and in the sessions doing your chat message set READ UNCOMMITTED (and AUTOCOMMIT ON if you fancy). Easy. This feature has been around at least since the 90s in Sybase and its descendants.

crux_ · on March 31, 2011

Sure, but you can't take advantage of that looseness to scale a collection across a crapton of cheap nodes, either.

(Which would be the main driving factor for abandoning ACID in the first place.)

MichaelGG · on March 31, 2011

You nailed it. The downside of NoSQL is that you have to deal with some proprietary API. SQL and tables can be quite useful for development purposes on some apps.

So check out H-Store[1], now commercialized as VoltDB[2]. You get full ACID, while staying with SQL, but also extremely fast performance and near-linear scaleout.

In my own tests (on call completion, save record and debit customer balance), I was easily to push over 100,000 transactions/sec on a 3-server cluster (low end, sub-$1000 servers). This is while maintaining full ACID and using SQL.

1: http://hstore.cs.brown.edu/ 2: http://voltdb.com/

crux_ · on March 31, 2011

It's an interesting approach, though let's be careful about how we define scaling -- with "full ACID" there are always operations that will make the system go pear-shaped, no matter how cleverly designed the system is.

Personally: I like transactions, but I'm not in love with SQL; additionally I'd like to have the options of:

- deploying systems, or portions of a system, that continue working when widely distributed over cheap, unreliable networks and running on cheap, unreliable hardware.

- Support large datasets (more slowly) but pay $100/terabyte instead of $LOTS/terabyte.

Of course, everything has its downsides -- which is why my stated desire for a do-everything DB is a bit of a pipe dream.

joe_the_user · on March 31, 2011

Well,

As far I can tell, it would be easier to add layers of indexes, data control, schema control, transactions and so-forth on top of something like Tokyo DB than it would be to take apart a SQL database.

crux_ · on March 31, 2011

Easier, yes. But would it be better?

Layering transactions atop a key-value store would be very difficult to do efficiently, for example.

quipo · on March 31, 2011

Apologies for the extremely flat and boring tone of voice and the somewhat rushed comparison of the actual products. Next time I'll do better, promise :)

Some notes to introduce the slides: http://www.alberton.info/nosql_databases_what_when_why_phpuk...

fl0bar · on March 31, 2011

I was at this conference and attended this speech, it was one of the better presentations

rch · on March 31, 2011

Very nice set of slides.

eitally · on March 31, 2011

Even better if you watch the video to get the commentary....

jjm · on March 31, 2011

+1 on that, commentary is excellent and concise. If you want a brief intro to concepts in distributed DBs this is it. Grab that coffee and turn up the speakers.