Can we please stop with these ridiculous "SQL is utterly hopeless" and SQL vs NoSQL articles? Saying that SQL is not scalable, full stop, is ignoring decades of real-world evidence. There's a right tool for every job, and SQL isn't always that tool, but articles like this aren't helpful in the slightest.
It's not that SQL databases do not scale per se, they just can't arbitrarily scale.
Most applications today manage pretty well with nothing more than sharding and replication. Sure, it's not the optimal solution, and it's not easy - but it gets the job done.
Facebook, at the very core (behind all the layers of cache), run, arguably, the largest MySQL landscape in the world - and they seem to be doing pretty good.
Until Heroku comes up with a better solution (or a better question), TFA is no more than flamebait.
I can't count how many Rails fanboys have written this same article. What do they think big companies, banks, government agencies -- organizations with huge databases -- use? What kind of database do the RoR developers think the real world runs on?
Claiming that SQL databases don't scale, especially using MySQL as the example, is simply ignorant. Oracle, DB2, SQL Server all scale to handle databases orders of magnitude bigger than anything the most successful RoR app is running.
What do they think big companies, banks, government agencies -- organizations with huge databases -- use? What kind of database do the RoR developers think the real world runs on?
CICS is a transaction processor. TPF and z/OS are operating systems. CICS applications will typically be accessing a relational DBMS such as DB2 or Oracle.
TPF blurs the boundaries between traditional databases and operating system. The OS itself has built-in support for transactions, locking of records, processing of record streams and other database-y things. So while strictly speaking, yes TPF is an OS, it is an OS that is designed to be a database, and would be useless for most other purposes.
Sure code still needs to be run on top of it for specific reasons, but I submit for consideration this is true of SQL databaes, and the data persistence layer of any NoSQL based app.
Similarly CICS may have a relational DB in the loop, or it may be pointing at some sort of TPF based system, or it could be pointing at a simple pile of COBOL records stored in z/VSE. The point is CICS is the transaction manager and provides the interface to the data much the same way an sql layer over a relational DB does.
Well, they don't scale indefinitely, and you do run in to Brewers Theorem at some point. While you're right in saying that if you have enough cash and horsepower to throw at SQL databases you can get a very long way, there is some point where anything which must be consistent is going to have an impact on availability.
This gets especially critical once you stop assuming that data can be centralised. Once your required transactional context needs to scale large areas (geographically or logically) then the overhead of this is mighty. 2PC itself does not scale indefinitely, and while interesting research is being done in alternative transactional approaches, it would be tough to argue that any are acceptable solutions to the problem.
Also your example of banks is perhaps not an ideal one. While banks may indeed run some very large databases, they don't run one HUGE database which is expected to be always consistent. Things like ATM systems etc. run based on eventual consistency and tolerance favouring availability precisely because a simple horespower approach to a transactional database (whether SQL or not) wouldn't currently work in this domain.
Nothing scales indefinitely, not even RoR. But that wasn't what the article complained about. If the author had written "SQL Databases Don't Scale Indefinitely" I wouldn't have any complaint about it.
My point is that there are plenty of real-world examples of SQL databases that are scaled way beyond any RoR web app's needs, so flatly claiming they don't scale is wrong.
The solution banks (and other large organizations) have arrived at -- using multiple databases with different transactional and availability requirements -- IS a solution to scaling large databases. There's no requirement that SQL databases be monolithic. Even the original article describes using a MySQL master for updates and one or more slaves for reads. That's a valid scalability solution if your application can live with the possibility of stale data in the slaves.
Hundreds of tiny systems that all integrate together in various complex ways? I mean, it's not like you can "mysqldump bankofamerica_production > backup.dump" or anything, right?
Wow, a lot of dislike of SQL, and a bit biased in their description of scaling SQL and sharding. Again, use the right tool for the right purpose, not what you think is intrinsically better.
Our app does double entry accounting, and considering the relational structure of accounting ledgers as well as that every entry needs to be posted in an ACID manner, we don't feel comfortable building out, say, a banking system in NoSQL.
Yes, SQL is painful to scale, but it can be manageable. We partition it in a scoped schema way (e.g each company account/tenant has its own set of tables on one of the shards with the server of the fewest consumed resources, or if it's a tie, a coin toss). Scaling it in an arbitrary manner such as by names beginning A to M, N to Q, etc simply seems like a bad way to do it. If your app is like a 37 Signals app where it doesn't typically require access to another account's data, this might be a good way to do it.
One of the programmers I hired was dead set on trying to use CouchDB for everything. I told him don't try to use it for everything--it's good to use for the audit trail and document revisions on invoices, orders, etc, but not on actual double entry journals.
The #1 way to get relational databases to scale is to either teach your application developers to really really understand relational databases and performance trade offs or don't allow them to be responsible for the data model.
In my experience I have seen more atrocities committed due to developers using a relational database as a file-system, an XML document, a key value store or some variation in between.
When people complain about scaling in relational databases they usually mean that it is more rigid, and not easily adapted to changes. Which is valid and is why you cannot apply software development practices to database design. With database design you have to plan for the future and think out how your solution will grow as to not paint yourself into a corner. Further in relational design there is no such thing as premature optimization, if one thinks there is, one has already failed at setting up a robust relational database architecture.
At one of my start-ups we supported massive amounts of traffic from Hotels.com, Orbitz, Travelocity and Expedia all looking for pricing and allotment each time someone hit their front door and we did it all on applications that where backed by relational databases. We never saw Google or Facebook size traffic, but we had constant load of more than any one of those travel sites and many times load of all of them combined.
A solid application architecture and a sound relational model can scale quite well you just have to have good people that understand each discipline.
It's useful to differentiate between "SQL Databases do not scale" and "SQL Databases do not _cost effectively_ scale". The second argument is more accurate.
Vertical scaling of a DB is definitely an option for many people and has been used to scale many applications. However, the cost curve associated with buying bigger and bigger hardware is super-linear; doubling CPU & Memory in a single system leads to more than doubling the hardware cost. This can be problematic for many businesses whose revenue growth is exceeded by cost growth of the database.
Sharding is also an option for scaling, leveraged to great success by Facebook, Yahoo, and many others. However as the article points out, sharding prevents the developer from using many of the features that make a relational database a productive development environment. There are lots of foot guns that emerge in a sharded SQL environment and if you have not set up your development constraints appropriately, you can slow the pace of development considerably. This again leads to a cost problem because the incremental costs of adding features grows as you add more things like sharding around your database.
SQL is not useless and not hopeless. In a large number of cases, SQL is the right solution. However the techniques used to scale SQL tend to be options only to very large budget organisations. NoSQL solutions tend to be more cost effective in their scaling approach (scale out vs. scale up) without crippling the developers productivity. For these reasons, NoSQL solutions tend to be the better choice for the cost-conscious.
"SQL" is not a classification of a database, it's a query language. Some of the databases which use SQL do not scale well or easily, and some databases which don't use SQL don't scale well either.
What Adam means to say is "some of the semantics defined by SQL are hard to get correct while maintaining scalability." Hard is very different from impossible, and there is a large number of very smart people currently solving this problem very well. There is another large number of smart people trying to solve the problem by ignoring the hard parts of the semantics. It seems they will likely come up with something very fast and scalable, but ultimately less useful for certain things which are done easier with proper SQL semantics. In fact, it's not clear in all cases that they are even making something that's more performant: cf. http://sergeitsar.blogspot.com/2011/01/mongodb-vs-clustrix-c...
Saying "SQL Databases Don't Scale" is like saying "oil paintings on wood aren't appealing". Not all oil-on-wood paintings are good, but some are, and some tempera-on-fresco paintings suck too. The logic is simply invalid.
>> When hundreds of companies and thousands of the brightest programmers and sysadmins have been trying to solve a problem for twenty years and still haven’t managed to come up with an obvious solution that everyone adopts, that says to me the problem is unsolvable.
I take issue with the assumption here. What you want, which is obviously "free/open mysql-ish thing that scales indefinatly", is not a problem people have been working that hard to solve until quite recently.
To put it in perspective, 20 years ago, many banks offered no access to your money outside open hours and the internet was not a thing normal people used.
Today, if you go to Oracle or IBM, you'll find that they'll be happy to help you solve your enourmous problem at great profit to them. The thing that's changed in the last few years, is that Web 2.0 guys want the same power (or more) for a tiny fraction of the cost.
This is a good thing. This is exciting. People recognize that the status quo sucks and are working hard on change. The solution will probably involve some SQL, and probably some other tools as well. Don't be such a downer. ;-)
Isn't that what Oracle RAC is for? If you are managing that much data, and you don't want to go the specialty DB route (nosql, for example) RAC scales.
It's also hellishly complicated, but so is the problem.
Right I feel like some times when someone say relational databases don't scale they mean that free databases don't scale without a lot of work. Scaling with Oracle is fast and easy (in comparison to the alternatives), it is why people still pay that kind of money for it. I personally find Oracle the company distasteful but the facts are the facts and Oracle does scale.
Sometimes I wonder... if three fortune 500 companies just spent the same amount as their Oracle budget on Postgres devs, they could probably have a RAC competitor in three years.
Then, just another billion on transition costs, right? :)
I agree, there is no technical reasons that postgres could not be as easy or as good at scaling as Oracle. Someone just has to put the money and effort in.
Oracle RAC may come closest to addressing most of these issues. Of course it is pricey. However it fulfills pretty much all the constraints mentioned here - application transparency while scaling(sharding does not involve changing the app), horizontal scalability(add nodes as needed), failover for both read & write transactions. It sounds like sale-pitch for RAC but it seems to geared to deal with these kind of scenarios.
RAC uses a shared disk; it doesn't eliminate the single point of failure. The shared disk also adds a contention point, such that RAC often stops scaling after a handful of nodes. Even the first few don't give you linear scalability without changes to the application and extensive tuning.
Sure SQL databases can scale. Oracle for instance has a parallel query option, but it costs real money and needs a lot of planning and configuration.
NoSQL databases scale much easier. Just add a new node and you are done. +1. Many of them are also freeware. +1 again. Many of them are also faster, because no parsing is done, +1. Many of them don't have and don't need locking. +1. No looks mean no write waiting. +1.
This is baffling to me that the founder of Heroku, a service specifically on automated web scalability, would write an article on how SQL doesn't scale, when he uses Postgres as the de facto database for all of his customers.