Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wouldn’t a filesystem backup of a running SQL database be corrupted when you try to restore it?


Whenever the filesystem is being snapshotted atomically, no. That would be effectively same state as from a power-cut at that exact moment.

It is correct though, that trying to do it with something like 'cp' on a live database will most likely be corrupt.


Not if the filesystem itself can do atomic snapshots. The problem can happen if you try to copy the files or even the device when it's being written to. But if you create a snapshot and copy that, it would be consistent.

You can of course the up in a dirty state where some new transaction was started but not committed, but any non-toy database should be able to recover from that. (You'll lose the transaction of course)


it’s equivalent to backing up a server after a hard power off.

as long as the filesystem supports some kind of journaling (aka it’s not ancient) and the database is acid compliant, there shouldn’t be any major issues beyond a slow startup.

but keep in mind that you may lose acid compliance by fiddling with the disk flushing configuration, which is a common trick to raise write speed on write choked databases. if that’s the case you may lose some transactions

your database documentation should have this information.


Mainly if you restore a previous version of the on-disk data while the DBMS is still running I would think. Because then the idea that the running DBMS has of the data no longer matches what’s on disk.

But if you stop the running DBMS before you restore the previous version on disk, and then start the DBMS again it should be able to continue from there.

After all that’s one of the key selling points of a bonafide DBMS like PostgreSQL, that it’s supposedly very good at ensuring that the data on disk is always consistent, so that when your host computer suddenly stops running at any point in time (power outage, kernel crash, etc) the data on disk is never corrupted.

If data is corrupted by restoring from a random point in time in the past, that should be considered a serious bug in PostgreSQL.


It depends on the DBMS, its configuration, the host OS, its configuration, and many other details. Snapshotting running databases is possible and often "just works," but you should always verify that before relying on this functionality in production.


> you should always verify that before relying on this functionality in production

Yep :)

At said company where I was using ZFS snapshots for the servers I administered, I additionally had a nightly cron job to dump the db using the tools that shipped with the DBMS. Just in case something with the ZFS snapshots fricked itself :)


Smart.

I did a daily dump of my prod database to my local workstation, needed it for prod corruption issue because ops was only doing weeklies.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: