Too complicated to even try to read. Rsync is great but I've switched to Borg for backups. Borg isn't perfect but it is a higher level approach to backups, as it were. Hetzner recently dropped the price of their Storage Box backup product to about 2 euro per TB per month, and Borg works nicely with it. Borg encrypts all the backup contents and conceals the metadata on the backup server, and yet you can (with the encryption passphrase) mount the backup archive as a read-only file system through FUSE and access it through normal file navigation. It is impressive.
Make sure your borg repos are copying properly and in full!
I had a horrible realisation that my borg backups were timing out on the offiste copy, meaning the resulting offsite backup I had was non-existent. The heartbreaking error message Inconsistency detected. Please run "borg check [repository]" - although likely this is "beyond repair"
Course following the 3-2-1 rule you're probably good, but aye I'm treating borg repos as delicately as I would a striped raid now. I am also monitoring that `borg check` actually comes back successfully before considering the backup complete too :)
Also if you're in a small company, start up, etc.. Set yourself a 3-2-1+ backup system. + is the miracle backup that makes you look good.
> I had a horrible realisation that my borg backups were timing out ... meaning the ... backup I had was ...
A backup that isn't tested is not a true backup, it is a disappointment waiting to be found! This can happy with any backup tool, my hand-crafted¹ rsync based scripts included.
Testing isn't hard to setup if you don't mind the final step being manual. Snapshots have a checksum file, and daily one is picked and rechecked, any difference is an indication of bit-rot on the storage medium or something accidentally getting deleted/modified otherwise. After the newest copy is created by the main backup script a list of files not touched since the backup started is made and pushed up, the backup site checks those files and sends the result back so it can be compared. Any difference in these checksums results in an email to an account that makes my phone shout in a distressed manner. The manual part here is occasionally checking the results manually because not getting an email could either mean all is well or that something has broken to the point that the checks aren't running at all².
For specific systems like my mail server I have a replica VM, not visible to the Internet at large, that wipes itself and restores from the latest off-site backup. I look at that occasionally to see that it is running and has the messages I've recently sent and received. As a bonus this VM could quickly be made to be publicly available and take over with a few firewall DNS changes, should the main mail server or its host physically die, and even if it doesn't take over its existence proves that the restore method is reliable should I need to restore the one in the primary location. Some extra automated checks could be added to this too, but again there comes a point where writing the checks takes more time than just doing that manually³ and I'd still do it manually out of paranoia anyway.
[1] If I'm honest, “string together” would be much more accurate than “crafted”
[2] I could automate that a bit too, but then that automation still needs to be verified occasionally, it quickly gets to the point where it is double-checks all the way down and making sure you check manually occasionally is far far more maintainable a system.
[3] If these were a business thing rather than personal services, then the automated procedure vs manual checks desirability balance might change somewhat.
> Live and learn.. and learn again later when you let yourself slip :)
Definitely. I'm as careful as I am due to past issues, either my own or those of others that I've witnessed. Seeing the look on someone's face when they ask “You know about these things, you can do something to get it back right? Right?!”, and having to let them down…
Borg has a command ("borg check") to test backup metadata. I'm not sure exactly what it does, but it is pretty slow, about 80 minutes to check a 1.6TB backup depending on how busy the backup server is. Another approach might be to mount the backup as a FUSE filesystem and monitor it with something like tripwire. I just thought of that a minute ago and haven't looked into it, so idk if it is workable. At the end of the day you have to occasionally do a full restore to another system, and test everything.
Fully agree, Borg is awesome. I wrote up how I'm using it here[1]. In short, borg backup to a local machine, and that machine uses rclone to copy the backups to an S3 bucket off-site. I've had multiple occasions to restore stuff successfully, and I never have to think about whether my data is retrievable.
I used to do something similarly complex to the OP with ssh and --link-dest to make it share inodes so I could easily keep N days of backups with file level deduplication [1].
Then I moved to Borg and haven't looked back. It does the same end goal in a better way, is way faster, and easier to work with overall [2].
Yes, the bigger ones are around 2 euro per TB. They also have Storage Share at around 3 euro/TB in the bigger sizes. Those have more features (they run nextcloud) and are themselves backed up several times a day.
I used rsync 20 years ago, switched to Borg, then switched to Restic. It’s nice to send backups to destinations like B2 or S3 without any special requirements. But the Hetzner Storage Box is good deal.
Now all I need is better client side software for these things, something I could use on my moms computer. Vorta is not that good, although I commend the effort.
Until the UI side is sorted, my moms laptop stays on Backblaze.
As a small film production we have approx. 120 TB to backup. Thats a lot of money per month, if you check AWS it is a fortune (half the value of a car) per month.
The script here really only makes sense for servers you physically own. You wouldn't accept SSH access from a key located on a plain VPS, right? Also this script doesn't seem to encrypt the data at all!! Very dangerous on a VPS.
> You wouldn't accept SSH access from a key located on a plain VPS, right?
With Borg, I use ssh -A from my laptop to start backups going. I can think of some other schemes like using multiple user accounts on a single Storage Box (Hetzner supports that and I believe offers an API) so that different VPS can't clobber each other's backups, that old backups become read-only, etc. It might be interesting to add some finer access controls on the server side. Borg supports an append-only mode but right now, that's only a config option rather than a security setting, I believe.
I've only recently started using Borg so I'm not really familiar with its intricacies yet. There are some things I would change but it is mostly well thought out, imho.
You can't run arbitrary shell commands on the storage box server. They have the Borg binary (1.17 last time I looked) installed on the server, and they officially support it. They seem to specially recognize the borg commands.