> Maybe shitty toy database systems like MySQL While appropiate to define in few...

spookthesunset · on Nov 8, 2019

Any database that would let you disable constraints on a session basis is a toy database. Such an operation doesn’t even make sense because at some point the relational integrity has to be enforced for the entire table. You can’t just have parts of a table be relationally correct. That is like saying 1 + 1 = 3. It is a completely illogical statement.

However I would not at all be surprised to learn MySQL supports such a thing. Which supports my assertion it is a toy used (or at least installed by) people who have no understanding of relational database architecture.

evanelias · on Nov 9, 2019

So are you asserting that the following products are all built on top of a "toy" database, and their engineers have no idea what they're doing:

Facebook, YouTube, Wikipedia, Pinterest, Slack, GitHub, Etsy, Yelp, LinkedIn, Shopify, Dropbox, Wordpress, Wix, Tumblr, Square, Uber, Booking.com, Box, Venmo, SendGrid, Okta, SurveyMonkey, WePay, Alibaba, SoundCloud, among countless others...

An alternative view is that your statements are incorrect. Do you have much direct experience with high-volume OLTP database workloads, or are you basing your views of MySQL on something else?

spookthesunset · on Nov 9, 2019

Once you are stuck with MySQL it is very, very, very hard to get an organization to switch--not only from a technical standpoint but a political one.

I bet you any competent engineer who knows their shit about DB in those companies regrets using MySQL. I bet their code is full of hacks, crappy schemas, and all kinds of work arounds because they chose mysql. I've seen it in every company that uses MySQL. The lengths people go to avoid schema changes is astonishing.

It is much, much better to start with a real database like PostgreSQL because whatever you pick is going to be what your entire org uses from now until eternity.

evanelias · on Nov 9, 2019

Cool, so I'm going to assume that means your answer to my question of "Do you have much direct experience with high-volume OLTP database workloads?" is "no". Given your "bet" as well as comments about schema change difficulty, I'm also going to assume you did not click through to my profile...

evanelias · on Nov 9, 2019

And to clarify, I'm not saying that to toot my own horn. My point was, I primarily work on open source schema management and related consulting. I talk to large companies about MySQL schema changes literally every single day. The comment about "the lengths people go to avoid schema changes is astonishing" simply does not gel with reality among large-scale MySQL users.

As for the random blind accusations about bad code, engineer incompetence, etc that's just rude, mean-spirited, and misinformed. I personally know a lot of exceptional database engineers who work on MySQL-related infrastructure at the companies I listed above. Why crap on other people's work that you haven't seen and know nothing about?

ngrilly · on Nov 9, 2019

Agreed. I think MySQL is still dragging the bad reputation it got in the early 2000s, which is unfair considering how much it improved. And I write that I someone that used to hate MySQL for all its shortcuts. I have one app in production based on MySQL. I have been thinking of switching to PostgreSQL for years. But the truth is that, the more MySQL improves, the less the switch is justified :) The engineering effort put by Google, Facebook, Oracle, etc. in MySQL and InnoDB during the last ten years is impressive.

ngrilly · on Nov 9, 2019

Qualifying MySQL as a "toy" database in 2019 is obviously wrong.

But I think most companies you mentioned don't use MySQL in the usual way, as they would use a "standard" enterprise database like Oracle, SQL Server or PostgreSQL.

These companies don't use MySQL directly. They use it indirectly as the storage component of a larger architecture. For example, YouTube uses Vitess "over" MySQL.

Companies like Instagram are known to do something similar with PostgreSQL.

My point is that maybe you and the parent comment are not thinking about the same use case.

evanelias · on Nov 9, 2019

> These companies don't use MySQL directly. They use it indirectly as the storage component of a larger architecture.

Yes and no. Often it's both. I say this first-hand, having performed significant work on the database tier for two of the companies I listed, and consulted for several others.

For example, while Facebook's largest db tier goes through a dao / writethru cache, there's plenty of other use-cases that are direct MySQL usage.

And in any case, why does it matter if there's another layer involved? It's still MySQL powering mission-critical global-scale use-cases. And for example with YouTube, literally the primary benefit of Vitess is that your application can treat it as a single normal unsharded MySQL installation, so those interactions are still very MySQLy.

ngrilly · on Nov 9, 2019

> For example, while Facebook's largest db tier goes through a dao / writethru cache, there's plenty of other use-cases that are direct MySQL usage.

I didn't know about that. That's interesting!

> And in any case, why does it matter if there's another layer involved?

I was writing that in the context of the parent comment about "disabling constraints". I can see why disabling constraints makes sense in a sharded environment, with an intermediate layer like Vitess. But the benefit of disabling constraints is less clear when using MySQL directly in a non-sharded environment.

Since you're here, I'd like to ask why you would use MySQL over PostgreSQL in a new project nowadays?

Regarding MySQL, the two main advantages I can think of are that tables are organized as clustered index (instead of a heap in PostgreSQL, which can be an advantage or a drawback depending on the workload) and the replication tooling.

On the other hand, PostgreSQL has a lot of useful features that I miss in MySQL: table elimination/join removal (exists in MariaDB but not in MySQL), indexes bitmap scan (to combine indexes efficiently), partial indexes, transactional DDL, LISTEN/NOTIFY, materialized views, row-level security, table functions like generate_series.

evanelias · on Nov 11, 2019

To explain more re: FB and having another layer on top of MySQL, there are a bunch of separate sharded MySQL tiers there. It's split by workload -- for example, the access pattern, schema, and sharding key differs completely between the main social graph, Messenger data, ad market, financial transaction data, etc. And then there's also the internal MySQL database-as-a-service, which allows any engineer to provision one or many databases for any other purpose. Overall, some of these things have services on top that use MySQL more as low-level storage, and others use MySQL in a more traditional fashion.

re: "disabling constraints", that's kind of orthogonal. The large MySQL users simply don't create foreign key constraints in the first place; there's nothing to disable :) Whereas MySQL's ability to disable constraints for a single session is a feature intended to make things like logical dump/restore easier and faster, schema management easier, etc. Without that feature, these tools would need to construct a dependency graph and create tables in a specific order (and/or defer FK creation until after the tables), which is needlessly complex if the tables are new/empty, and very slow if restoring a logical dump which is already known to be referentially consistent.

As for MySQL vs Postgres, IMO both databases are close enough in major functionality that for many use-cases it's best to just go with what you already know, can hire for, and can operate. There are special cases where one is better than the other, for example personally I'd go with MySQL for social networking / UGC / very high volume OLTP, and go with Postgres for use-cases where solid geospatial, OLAP, or fulltext are core requirements and/or there's a desire to minimize the number of different data stores.

Ideally with MySQL you're just using it for OLTP, and deferring to separate systems for OLAP, fulltext search, etc. In a way that's "more UNIXy" but it's also potentially an operational headache.

In terms of specific feature comparison, you already have a great list there. A couple other things on the MySQL side I'd mention are InnoDB's buffer pool (smarter caching than relying on the OS cache as pg does) as well as the existence of MyRocks storage engine (LSM-based system offering better compression than pretty much anything else of comparable performance level for OLTP).

That all said -- Postgres is an awesome database, and I'd say that Postgres is more closely aligned with the textbook definition of a proper relational database. But then again I'd also say something similar about FreeBSD (vs Linux) for server operating systems, yet for practical purposes I always go with Linux anyway :)

ngrilly · on Nov 12, 2019

Thanks for sharing your experience here!

> Without that feature, these tools would need to construct a dependency graph and create tables in a specific order (and/or defer FK creation until after the tables)

This problem alone justifies the ability to temporarily disable constraints. I notice PostgreSQL, which is a toy too ^__^, offers something similar.

> Go with Postgres for use-cases where solid geospatial, OLAP, or fulltext are core requirements and/or there's a desire to minimize the number of different data stores

I agree that PostgreSQL is a really good match for these use cases.

Great point about Linux and FreeBSD ;)

ngrilly · on Nov 12, 2019

PostgreSQL is not a toy database either, and yet you can do:

    ALTER TABLE <tablename> DISABLE TRIGGER ALL

Or:

    SET session_replication_role = 'replica'