The state of backup tech is surprisingly bad, and runs OS deep. Even with modern...

asmor · on Sept 15, 2023

The advantage and disadvantage of Linux simultaneously is that you need to pick such a filesystem or work around the limitations, but it's your choice. The OS underneath really should not be responsible. Apple solves it with APFS snapshots and Microsoft has Volume Shadow Copy (which requires NTFS or ReFS).

I personally use compose for all my services now and back up my compose.yaml by stopping the entire stack and running a restic container that mounts all volumes in the compose.yaml.[1] It's not zero downtime, but it's good enough, and it's extremely portable since it can restore itself.

[1]: https://gist.github.com/acuteaura/61f221ada67f49193bc1f93955...

adobrawy · on Sept 15, 2023

Filesystem snapshot for database is not different than database crash. Databases are designed to handle crashes well (WAL etc.).

ntolia · on Sept 15, 2023

Unfortunately, this is not true. You need to grab all the DB files (WAL, etc.) in a consistent manner. You can't grab them while writes are in progress. There are ways though. Look at what Kanister does with its recipes to get consistent DB snapshots to get a sense of the complexities need to do it "right."

Freaky · on Sept 16, 2023

> Unfortunately, this is not true. You need to grab all the DB files (WAL, etc.) in a consistent manner. You can't grab them while writes are in progress.

Perhaps you could be more specific, because the former is exactly what a filesystem snapshot is meant to do, and the latter is exactly what an ACID database is meant to allow assuming the former.

> Look at what Kanister does with its recipes to get consistent DB snapshots

I looked at a few examples and they mostly seemed to involve running the usual database dump commands.

donmcronald · on Sept 16, 2023

I’ve always assumed a snapshot is no worse than power loss and that any decent database should handle it ok. Is that wrong?

Freaky · on Sept 16, 2023

FreeBSD had a pretty decent option in the base system two decades ago - FFS snapshots and a stock backup tool that would use them automatically with minimal effort, dump(8). Just chuck `-L` at it and your backups are consistent.

Now of course it's all about ZFS, so there's at least snapshots paired with replication - but the story for anything else is still pretty bad, with you having to put all the fiddly pieces together. I'm sure some people taught their backup tool about their special named backup snapshots sprinkled about in `.zfs/snapshot` directories, but given the fiddly nature of it I'm also sure most people just ended up YOLOing raw directories, temporal-smearing be damned.

I know I did!

I finally got around to fixing that last year with zfsnapr[1]. `zfsnapr mount /mnt/backup` and there's a snapshot of the system - all datasets, mounted recursively - ready for whatever backup tool of the year is.

I'm kind of disappointed in mentioning it over on the Practical ZFS forum that the response was not "why didn't you just use <existing solution everyone uses>", but "I can see why that might be useful".

Well, yes, it makes backups actually work.

> Also, it's unclear to me what happens if you attempt a snapshot in the middle of something like a database transaction or even a basic file write. Seems likely that the snapshot would still be corrupted

A snapshot is a point-in-time image of the filesystem at a given point. Any ACID database worth the name will roll back the in-flight transaction just like they would if you issued it a `kill -9`.

For other file writes, that's really down to whether or not such interruptions were considered by the writer. You may well have half-written files in your snapshot, with the file contents as they were in between two write() calls. Ideally this will only be in the form of temporary files, prior to their rename() over the data they're replacing.

For everything else - well, you have more than one snapshot backed up, right?

1: https://github.com/Freaky/zfsnapr

thesh4d0w · on Sept 15, 2023

> Also, it's unclear to me what happens if you attempt a snapshot in the middle of something like a database transaction or even a basic file write. Seems likely that the snapshot would still be corrupted.

You just quiesce the database first. Any decent backup engine has support to talk to a DB and pause / flush everything.

cientifico · on Sept 15, 2023

For database, in my experience, is better to dump the whole database and backup the dump for the reasons you explained

viraptor · on Sept 15, 2023

Depends on your database size, type, change rate, etc. Dumping the database to a file is fine for toy and small cases, but not for a 1+TB store that's under heavy writes.