I am done with the cloud as a backup. I use ZFS incremental replication of Snaps...

antongribok · on Jan 27, 2024

If you are only using those two tools, then you only have a system for replication (and snapshots), but not a backup system.

If there is a data corruption bug in ZFS, it will propagate to your remote and corrupt data there.

I hope you have something else in place besides those two tools.

Helmut10001 · on Jan 27, 2024

Yes (although ZFS is pretty stable, it is always good to mention to not put all your eggs in a single basket).

My fallbacks are:

- an external drive that I connect once a year and just rsync-dump everything

- for important files, a separate box where I have borg/borgmatic [1] in deduplication mode installed; this is updated once in a while

Just curious: Do you have any reason to believe that such a data corruption bug is likely in ZFS? It seems like saying that ext4 could have a bug and you should also store stuff on NTFS, just in case (which I think does not make sense..).

[1]: https://github.com/borgmatic-collective/borgmatic

Helmut10001 · on Jan 28, 2024

Good further comment on the subject [1].

[1]: https://www.reddit.com/r/zfs/comments/85aa7s/comment/dvw55u3...

antongribok · on Jan 30, 2024

It's funny that you link to a comment from 6 years ago. Just a month after that there was a pretty big bug in ZFS that corrupted data.

https://github.com/openzfs/zfs/issues/7401

Corresponding HN discussion at the time: https://news.ycombinator.com/item?id=16797644

Helmut10001 · on Feb 3, 2024

Yes, I read that. It is underpinning the 3-2-1 rule, 3 backups, on 2 different mediums (where zfs can be one of the two), one off-site.

I think it makes sense and thank you for the sensible reminder.

Helmut10001 · on Jan 27, 2024

I am being downvoted, maybe I should explain myself better:

Most filesystems from x years ago do not cope well with current trends towards increasing file numbers. There are robust filesystems that can deal with Petabytes, but most have a tough time with Googolplexian filenumbers. I speak of all the git directories, or venv folders for my 100+ projects that require all their unique dependencies (a single venv is usually 400 to 800k files), or the 5000+ npm packages that are needed to build a simple website. or the local GIS datastores, split over hundreds and thousands of individual files.

Yes, I may not need to back those up. But I want to keep my projects together, sorted in folders, and not split by file system or backup requirements. This means sooner or later I need something like rsync to back things up somewhere. However, rsync and colleagues will need to build a directory and file tree and compare hashes for individual files. This takes time. A usual rsync scan on my laptop (ssd) with 1.6 Million files takes about 5 Minutes.

With ZFS, this is history and this is the major benefit to me. With block suballocation [1] it has no problems with a high number of small files (see a list of all filesystems that support BS here [2]). And: I don't have to mess with the file level. I can create snapshots and they will transfer immediately, incrementally and replicate everything offsite, without me having to deal with the myriad requirements of Volumes, Filesystems and higher level software (etc.).

If I really need ext4 or xfs (e.g.), I create a ZFS volume and format it with any filesystem I want, with all features of ZFS still available (compression, encryption, incremental replication, deduplication (if you want it)).

Yes, perhaps this has nothing to do with the _cloud_ (e.g. rsync.net offers zfs snapshot storage). But the post was about rclone, which is what my reply was pointed to.

[1]: https://en.wikipedia.org/wiki/Block_suballocation

[2]: https://en.wikipedia.org/wiki/Comparison_of_file_systems