Hacker News new | past | comments | ask | show | jobs | submit login

Can someone briefly explain how ZDD works in general?

I guess both versions of the app must be running simultaneously, with new traffic being routed to the new version of the app.

But what about DB migrations? Assuming the app uses a single database, and the new version of the app introduces changes to the DB schema, the new app version would modify the schema during startup via a migration script. However, the previous version of the app still expects the old schema. How is that handled?




First step is to decouple migrations from deploys, you want manual control over when the migrations run, contrary to many frameworks default of running migrations when you deploy the code.

Secondly, each code version has to work with the current schema and the schema after a future migration, making all code effectively backwards compatible.

Your deploys end up being something like:

- Deploy new code that works with current and future schema

- Verify everything still works

- Run migrations

- Verify everything still works

- Clean up the acquired technical debt (the code that worked with the schema that no longer exists) at some point, or run out of runway and it won't be an issue


There's a little bit more to it. Firstly you can deploy the migration first as long as it's forwards compatible (ie. old code can read from it). That migration needs to be zero downtime; it can't, for example, rewrite whole tables or otherwise lock them, or requests will time out. Doing a whole new schema is one way to do it, but not always necessary. In any case you probably then need a backfill job to fill up the new schema with data before possibly removing the old one.

There's a good post about it here: https://rtpg.co/2021/06/07/changes-checklist.html


We recently wrote about how we decided to handle deleting tables etc. https://bigbinary.com/blog/rails-8-deleting-tables-columns-u...


I thought it was correct to run the DB migrations for the new code first, then deploy the new code. While making sure that the DB schema is backwards compatible with both versions of the code that will be running during the deployment.

So maybe there's something I'm missing about running DB migrations after the new code has been deployed - could you explain?


I'm not the person you've asked, but I've worked in devops before.

It kinda doesn't matter which you do first. And if you squint a little, it's effectively the same thing, because the migration will likely only become available via a deployment too

So yeah, the only things that's important is that the DB migration can't cause an incompatibility with any currently deployed version of the code - and if it would, you'll have to split the change so it doesn't. It'll force another deploy for the change you want to do, but it's what you're forced to do if maintenance windows aren't an option. Which is kinda a given for most b2c products


This is very good explanation, no judgment and simply educational. Appreciated

Though I'm still surprised that some people run DB alteration on application start up. Never saw one in real life.


We do this. It has worked very well for us.

There's a couple of fundamental rules to follow. First, don't put something that will have insane impact into the application deploy changes. 99% of the DB changes are very cheap, and very minor. If the deploy is going to be very expensive, then just don't do it, we'll do it out of band. This has not been a problem in practice with our 20ish person team.

Second, it was kind of like double entry accounting. Once you committed the change, you can not go back and "fix it". If you did something really wrong (i.e. see above), then sure, but if not, you commit a correcting entry instead. Because you don't know who has recently downloaded your commit, and run it against their database.

The changes are a list of incremental steps that the system applies in order, if they had not been applied before. So, they are treated as, essentially, append only.

And it has worked really well for us, keeping the diverse developers who deploy again local databases in sync with little drama.

I've incorporated the same concept in my GUI programs that stand up their own DB. It's a very simple system.


The main challenge I have noticed with that approach is maintaining the sequencing across different branches being worked upon by different developers - solvable by allocating/locking the numbers from a common place. The other is rolling back multiple changes for a given view/stored proc where, say, each change added a separate column - if only one is rolled back, how do you automate that? Easily done manually though.


I will say that stored procs are specifically messy, and we did not have many of those. They had a tendency to really explode the change file. With DDL, you can fix a table column in isolation. Fixing a typo in a 100 line stored proc is another 100 lines. And we certainly didn't have multiple people working on the same proc at the same time.

We had no real need to address that aspect, and I would do something more formal with those if I had to, such as having a separate file with the store proc, and simply a note that it has changed in the change file. I mean, that's a bit of a trick.


> Though I'm still surprised that some people run DB alteration on application start up

I think I've seen it more commonly in the Golang ecosystem, for some reason. Also not sure how common it is nowadays, but seen lots of deployments (contained in Ansible scripts, Makefiles, Bash scripts or whatever) where the migration+deploy is run directly in sequence automatically for each deploy, rather than as discrete steps.

Edit: Maybe it's more of an educational problem than something else, where learning resources either don't specify when to actually run migrations or straight up recommend people to run migrations on application startup (one example: https://articles.wesionary.team/integrating-migration-tool-i...)


It makes things somewhat easier if your app is smallish and your workflow is something like e.g. Github Actions automatically deploying all commits on main to Fly or Render.


At my company, DB migrations on startup was a flag that was enabled for local development and disabled for production deploys. Some teams had it enabled for staging/pre-production deploys, and a few teams had it turned on for production deploys (although those teams only had infrequent, minor changes like adding a new column).

Personally I found the idea of having multiple instances running the same schema update job at the same time (even if locks would keep it from running in practice) to be concerning so I always had it disabled for deploys.


So if you add any constraints/data, you can't rely on them being there until version n+2 or you need to have 2 paths 1 for the old date, 1 for new?


Effectively yes. Zero downtime deployments with database migrations are fiddly.


This is the way


Migrations have to be backwards compatible so the DB schema can serve both versions of the app. It's an extra price to pay for having ZDD or rolling deployments and something to keep in mind. But it's generally done by all the larger companies


Others have described the how part if you do need truly zero downtime deployments, but I think it's worth pointing out that for most organisations, and most migrations, the amount of downtime due to a db migration is virtually indistinguishable from zero, particularly if you have a regional audience, and can aim for "quiet" hours to perform deployments.


> the amount of downtime due to a db migration is virtually indistinguishable from zero

Besides, once you've run a service for a while that has acquired enough data for migrations to take a while, you realize that there are in fact two different types of migrations. "Schema migrations" which are generally fast and "Data migrations" that depending on the amount of data can take seconds or days. Or you can do the "data migrations" when needed (on the fly) instead of processing all the data. Can get gnarly quickly though.

Splitting those also allows you to reduce maintenance downtime if you don't have zero-downtime deployments already.


Very much so, we handle these very differently for $client.

Schema migrations are versioned in git with the app, with up/down (or forward/reverse) migration scripts and are applied automatically during deployment of the associated code change to a given environment.

SQL Data migrations are stored in git so we have a record but are never applied automatically, always manually.

The other thing we've used along these lines, is having one or more low priority job(s) added to a queue, to apply some kind of change to records. These are essentially still data migrations, but they're written as part of the application code base (as a Job) rather than in SQL.


Lengthy migrations doesn't matter. What matters is whether they hold long locks or not. Data migrations might take a while but they won't lock anything. Schema migrations, on the other hand, can easily do so, like if you add a new column with a default value. The whole table must be rewritten and it's locked for the entire time.


Schema migrations can be quite lengthy, mostly if you made a mistake earlier. Some things that come to mind are changing a column’s type, or extending VARCHAR length (with caveats; under certain circumstances it’s instant).


Not OP, but I would consider this a data migration as well. Anything that requires an operation on every row in a table would qualify. Really changing the column type is just a built in form of a data migration.


Most are not affected by db migrations in the sense that migrations are run before the service starts the web server during boot. the database might block traffic for other already running connections though,in which case you have a problem with your database design.


I haven't tried it but it looks like Xata has come up with a neat solution to DB migrations (at least for postgres). There can be two versions of the app running.

https://xata.io/blog/multi-version-schema-migrations


Yes, both versions must be running at some point.

The load balancer starts accepting connections on Server2 and stops accepting new connections on Server1. Then, Server1 disconnects when all of its connections are closed.

It could be different Servers or multiple Workers on one server.

During that window, as the other comments said, migrations have to be backwards compatible.


Strong migrations helps writing migrations that are safe for ZDD deploys. We use it in our rails app, catches quite a few potential footguns. https://github.com/ankane/strong_migrations




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: