Django zero-downtime migrations for Postgres that respect database locks

stavros · on Oct 23, 2018

So basically what this does is avoid locking the database while migrating? Obviously (and the README mentions this) the two releases have to be backwards- and forwards-compatible, so with this backend you avoid locking for other processes while migrating.

Very useful, especially for long migrations, but I would like to see a bit more detail about how this library achieves this, what the caveats are, etc.

EDIT: Never mind, there's a comprehensive "how it works" section farther down, I just needed to scroll far enough. This is very useful.

tbicr · on Oct 23, 2018

I found only two caveats:

- it doesn't use transactions, so if migration will down, then you will need to fix state manually (one point of improvement), however you cannot run `CREATE INDEX CONCURRENTLY` in transaction.

- it can use `CHECK (column IS NOT NULL)` constraint for safe `NOT NULL` replacement for huge tables, that isn't compatible with standard django behavior.

So all this cases highlighted in README.

aidos · on Oct 23, 2018

Looks like this handles a bunch of cases around locking you can run into, which seems a noble goal. I like the approach of making a matrix of all the migration operations and figuring out how to work around the locks, kudos.

There are definitely some wonderful ways to mess things up when starting out with postgres migrations. Nothing quite like the surprise you get the first time you rewrite every tuple on a table with 20M rows because you added a column with a default value (no longer an issue with the latest postgres).

As pointed out in the docs, your code must be prepared to support the schema both before and after the migration. Not that I use django anyway, but normally I'm more worried about the interlacing of the code and db changes to keep things running smoothy. Migrate a bit, release some code, migrate some more etc.

bmdavi3 · on Oct 23, 2018

This is cool.

Another alternative would be to use postgres savepoints, which are like transactions inside transactions, as a wrapper around each migration. You can do the same thing - set lock_timeout and catch errors when those values are exceeded, and try the transaction again.

Provide an option to run the handful of operations that can't be run inside a transaction as an escape hatch, and then you can retain the ability to run migrations inside transactions, which is usually a good thing.

tbicr · on Oct 23, 2018

Unfortunately savepoints live only in transaction and ACCESS EXCLUSIVE will be released only on whole transaction commit/rollback, so look like no benefits with savepoints for schema migrations to compare with plain transaction.

rbanffy · on Oct 23, 2018

This is awesome. It's not a silver bullet and won't solve all case. You'll still need to design in degraded modes for your application for when parts of it become unusable (that's you should be doing anyway), but this covers a lot of your likely needs.

etaioinshrdlu · on Oct 23, 2018

Is there any equivalent for django+mysql? I historically found more tools for online schema changes in the mysql sphere, such as gh-ost ( https://github.com/github/gh-ost )

rantanplan · on Oct 23, 2018

gh-ost is hardly useful for anyone outside of github.

It's predicated on the fact that you don't use foreign keys. Now why would someone use MySQL without FKs... is beyond me, but I'm sure they have their reasons.

MySQL has this https://dev.mysql.com/doc/refman/5.6/en/innodb-online-ddl.ht... for Online Schema migration and

MariaDB has ALTER ONLINE https://mariadb.com/kb/en/library/alter-table/

They both work with triggers + ghost tables, so they don't need transactions.

sciurus · on Oct 23, 2018

It's not just gh-ost. pt-osc is much safer to use when you don't have to worry about updating foreign keys.

Here are some of Github's reasons for not using them: https://github.com/github/gh-ost/issues/331

I think this is pretty normal for heavy MySQL users, honestly. When I was at Eventbrite we didn't use them for the same reasons.

rantanplan · on Oct 23, 2018

I've read about their reasons, I just didn't want to open a discussion about it :D

Deciding to implement constraints on the application/business layer is something that github can probably do, in the same vein that Facebook can create their own PHP compiler.

I just think that you can't use these examples as effective arguments on whether someone should use FKs or not. I guess whatever brute-force solutions these companies would apply, on any given problem, would probably work one way or the other.

orf · on Oct 23, 2018

Rails does not use foreign keys by default.

rantanplan · on Oct 23, 2018

What does "by default" mean?

If you want to have referential integrity there's no other choice. Otherwise you create tables that have no relation to each other. No one stops you from doing this, but then you probably don't want an RDBMS in the first place.

orf · on Oct 23, 2018

Rails by default does not add a foreign key constraint at the database level when defining a relationship[1]. And once you have a fairly large rails app that uses this default it's somewhat tricky to migrate.

It's madness, I know.

1. https://api.rubyonrails.org/classes/ActiveRecord/ConnectionA...

rantanplan · on Oct 23, 2018

WTF!

Thanks for that tidbit, I have no XP with Rails, but I would never imagine they do something so atrocious.

ubernostrum · on Oct 23, 2018

Django's ORM will create DB-level foreign keys on databases/table engines that support them. But it doesn't create many other types of database-level validation (mostly just UNIQUE and NOT NULL constraints).

Some of that is compatibility reasons; not every DB Django supports would allow creating all the desired types of constraints in the DB, so Django has no choice but to ship application-layer enforcement.

You can work around that with manually-created migrations (and I've worked some places that did this, in order to get particular constraint types that were needed at the DB level), but I don't know of a good generic solution to that problem.

Django 2.2 is going with a somewhat-manual approach to allow specifying richer types of constraints, so maybe that'll lead to progress.

lloeki · on Oct 23, 2018

The rationale is that non-trivial conditions on ActiveRecord model associations can be quite dynamic (possibly implying ruby code evaluation), therefore constraints are best handled at the model level, or else the constraints would only be partially checked at the db level, or even conflicting with the models (e.g if a constraint is to be enforced based on some condition evaluated at runtime, like a simple if clause, or when using STI or polymorphism).

rantanplan · on Oct 23, 2018

I have no clue what you're talking about. That's an honest statement, not trying to be confrontational.

If you have constraints that can be enforced by the DB, you simply use the DB's constraints, because the DB guarantees they're gonna work 100% of the times and your data will be correct.

If they are more dynamic or require custom business logic... well you do it on the application layer. That's what everyone does.

I mean you can probably implement your own transactions, that doesn't mean that you should. And if you do, then just admit that there's no point in using an off-the-shelf RDBMS.

matthewmacleod · on Oct 23, 2018

Well, Rails is kind of an off-the-shelf solution to building web apps, so it makes some compromises to present a uniform process and API for some actions. One of those trade-offs is that the relationship between database tables in ActiveRecord is handled at the model/application layer instead of the database layer.

This has some benefits - it allows for a uniform definition of relationships regardless of database backend, allows for constraints that can’t be expressed by the database itself, and allows constraints to be used as a first-class concept for things like presenting error messages to users. But it also means that data integrity is not guaranteed - modifying records concurrently or outside of the application can result in a data model that’s valid according to the database schema but not according to the application model.

FWIW I exclusively use Postgres as my Rails database backend these days, and foreign key relationships are extremely easy to include in migrations. This still requires a companion definition in application code so that good error messages can be presented, but that seems acceptable to me. I’d hope that this eventually becomes the default for databases that support these keys.

rantanplan · on Oct 23, 2018

Sure. I'm all for options and pluralism.

I am just that crazy person that believes that default options should err on the safety/integrity/consistency side.

I also love abstractions, but abstractions can't change their underlying fundamental reality. So I would like to be the one who makes the compromises and I'd like those compromises to be explicit rather than implicit.

nicolasMLV · on Oct 23, 2018

Rails uses foreign keys by default, usually I generate a migration file with `rails generate migration AddThisToThing this:references`

Then this line is generated `add_reference :things, :this, foreign_key: true`

and then we call the migration, and there is the foreign key.

orf · on Oct 23, 2018

Is it not more accurate to say "by default rails generates migrations with the foreign key argument set to a non default value of true"?

By default the relationship constructors do not create foreign keys, if they did it would be redundant to include it in the generated output surely? At least that's what the docs seem to say.

pqdbr · on Oct 23, 2018

You can create your models anyway you want, but the “default” is by doing something like this:

rails g model Post user:references

Doxin · on Oct 23, 2018

I'm not even sure this is possible in mysql. Last time I checked mysql still couldn't do DDL statements in a transaction.

lloeki · on Oct 23, 2018

Last time I checked MySQL even silently commits out of the transaction and proceeds with the remainder of the queries.

Here are the "implicit commit" statements:

https://dev.mysql.com/doc/refman/5.7/en/implicit-commit.html

Doxin · on Oct 25, 2018

Honestly I don't see a single good reason for any new project to use mysql over postgres. mysql is almost universally terrible at actually being a database.

fredley · on Oct 23, 2018

In Django it doesn't, it just dumps you out with a fucked-up semi-migration that takes you 20 minutes to get back to a consistent state...

leowoo91 · on Oct 23, 2018

Afaik mysql migrations doesn't have transactions in Django.

orf · on Oct 23, 2018

There have been some messages on the development mailing list about adding support for create index concurrently when using Postgresql, which is a big pain point for larger migrations.

It's a bit tricky as you cannot use it in a transaction though

tbicr · on Oct 23, 2018

Constraints creation also tricky because in transaction it take ACCESS EXCLUSIVE lock (eg. lock table on READ/WRITE) and spend time to constraint validation (CHECK, NOT NULL, FOREIGN KEY), so lock will be released only on transaction commit/rollback.

crescentfresh · on Oct 23, 2018

This is amazing. My conundrum is we don't use Django lol. Wish this sort of thoroughness existed in standalone pg migration tooling.

ggregoire · on Oct 23, 2018

Same here. Our backend team uses Python, SQLAlchemy and PG. They are looking for a tool to handle migrations and schema versionings. Any recommendations?

aidos · on Oct 23, 2018

Alembic. It's made by zzzeek (Mike Bayer), who created SQLA and it's hackable to do custom migrations.

We use it by changing the models in code and then having Alembic automatically create a suitable migration for us. Then we change the migration as required (adding steps for data migrations etc). It's really simple:

    alembic revision --autogenerate -m 'changing something...'

That generates a python file with an upgrade and downgrade function. You can then customise fully as you require.

Then on the db server you can run:

    alembic upgrade head  # or downgrade etc

crescentfresh · on Oct 23, 2018

We use the "forward only" approach, and use a combination of .sql and .py migration scripts and use mschematool[1] to tie the room together.

[1] https://github.com/aartur/mschematool

fredley · on Oct 23, 2018

This is indeed incredible. My conundrum is we don't use pg :-(