Beware that this script only uses rsync with the "--archive" flag.
This may be enough for some users, but "--archive" does not copy all file metadata, so it may cause surprises.
rsync must be invoked with "--archive --xattrs --acls" to guarantee complete file copies.
Unfortunately all command-line utilities for file copying that are available for UNIX-like operating systems use default options that do not copy most file metadata and they usually need many non-default flags to make complete file copies.
Nevertheless, rsync is the best of all alternatives, because when invoked correctly it accurately copies all file metadata even between different file systems and different operating systems, in cases when other copying utilities may lose some file metadata without giving any errors or warnings for the users.
ZFS snapshots also have the added benefit of not taking up duplicate space. And, should a Linux ransomware ever come into existence, snapshots help against them.
Also have a look at --numeric-ids, otherwise it translates UIDs back and forth according to real user names - which will end up in a terrible mess, especially when you are restoring the backup from a live "CD" (which has different UID ←→ username mapping than the target system).
Most modern file systems have been updated to support equivalent metadata.
However many old versions of file systems lacked support for things like high-resolution timestamps, extended attributes or access-control lists. Because such support has been introduced much later, there are a lot of programs for archiving, backup and copying which lose such metadata, at least with their default options.
While equivalent metadata exists on Linux XFS and EXT4, Windows NTFS, FreeBSD UFS and ZFS etc. each platform has different APIs for accessing file metadata (and on Linux the API may vary even between the many available file systems), so only few programs, like rsync, are able to copy everything while taking care to use the correct API to replicate the metadata without losing information.
For years I had a custom script sync my ~/.ssh directory on my primary workstation to my laptop, to pick up new keys and config changes. It failed after I switched from Ubuntu to Fedora, and I was surprised to discover --xattrs fixed it.
I've been doing rsync-based backups of close to a thousand systems for ~20 years, most notably for a long time I backed up the python.org infrastructure, and I have quite a few thoughts on this. I also have a battle-tested rsync wrapper that I'll point to below.
- Backups should be automatic, only requiring attention when it is needed. This script philosophy seems to be "Just do your best, mail a log file, and rely on the user to figure out if something didn't work". Even for home backups, this is just wrong.
- As an example of the above: This script notes that it fails if a backup takes more than 24 hours.
- The "look for other rsyncs running" part of the code is an odd way of approaching locking, but for a single personal "push" backup I guess it is ok.
- As the filename implies, the goal is to rsync to a zfs destination, and it will take a zfs snapshot as part of this. It is easy to customize to another backup destination, I've had people report they have customized it for their own laptop backups, for example to an rsync.net destination.
- It goes out of its way to detect when rsync has failed and log that.
- It does do "inplace" rsyncs, which dramatically save space if you have large files that get appended to (logfiles, ZODB databases).
- This is part of a larger system that manages the rsyncs of multiple systems, both local and remote. Things like alerting are done if a backup of a system has failed consistently for many days.
- In the case that there are no failures, there is no e-mail sent, meaning the user only gets actionable e-mails.
The hardlink trick only works for fairly small data sets. Issues include: Managing hard links takes a lot of overhead, especially on spinning discs. Large files being appended to use a ton of space (a 4GB file with 1K appended every day uses 128GB to store 14 dailies, 6 weeklies, and 12 monthlies). ZFS is a pretty good destination for rsync, as similar snapshots will use 4GB to store.
> - In the case that there are no failures, there is no e-mail sent, meaning the user only gets actionable e-mails.
I've always thought this isn't the right approach.
How do you know if the email server is borked or you commented out the script in cron to debug it and forgot to put it back in?
Either there's a weekly status report to tell things have been green or you could place cron checks like healthchecks.io (You can self host it.)
Also it's much better to use 'zfs send' with large volumes to backup if both ends have zfs as zfs knows which files have changed and it doesn't have to scan for what has changed on each go as any other tools do like rsync.
I get your point, and different people have different things they are comfortable with. Some of that also depends on your environment, in my case I've had this script running nightly backups across hundreds of machines over a dozen years. Sending hundreds of e-mails a day into my inbox isn't going to be workable. But if you have one or two machines, maybe it is. I still don't think so, but again different people.
Things I have done to ensure reliability (again, this core script has been running for a dozen or more years):
- Nagios monitoring of backups: An active check from a monitoring server that alerts if no recent successful backups.
- "paper path" monitoring of e-mail: Send an e-mail to an external mailbox and have an active check in Nagios that reports if it has not seen an e-mail recently.
- With hundreds of machines, we were in the management interface enough (not daily, but at least monthly) that we would tend to notice before TOO long if something was out of whack.
- Regular backup audits: We would perform quarterly backup audits of the important machines, we had a whole workflow for those, which would also give us confidence that the backups were running as expected and that if something got out of whack it didn't go too long. Many of these depend on your definition of "too long".
As far as "zfs send", I totally agree. However, even today I have very few machines other than my backup machines that are running ZFS, so that's not really an option for these backups.
I wish this misguided notion regarding "inofficial strict mode" would go away, as `set -e` is suboptimal (and can be very tedious to deal with on top of that): https://mywiki.wooledge.org/BashFAQ/105
It makes sense to opt into errexit for select blocks/sections in scripts and under certain circumstances, but having it default-on is a recipe for quite a bit of head-scratching in the future.
Using -e isn't an excuse to not understand how it works and when it doesn't. It's so that the default in most situation is to exit on failure as that's likely what you want to do. That leads to more terse scripts which hopefully are easier to write correctly and understand. I'd argue that opting in to exit-on-failure for stand alone commands is the right default.
No, absolutely wrong. The ”right default” is to catch return codes and provide actionable, clear and consistent error messages through an error-exit function.
If I'm writing a bash script I run manually and it's more than a handful of functions I agree. I'm there to babysit it and its probably complicated.
If the bash script is ran on thousands of containers where its not possible to babysit. My number one job is to stop immediately when an error happens and surface that error to any monitoring system.
I agree about that distinction, but still, won’t that error need to be formatted? Is it safe to rely on logging picking up on the error or does the ”simple script” solution imply that there is monitoring for the script exit code?
# Make sure no one else is using rsync
pro_on=$(ps aux | grep -c rsync)
A better way to do that is with the flock utility.
(
flock -n 9 || exit 1 # Critical section-allow only one process.
...single thread shell script
) 9> ~/.empty_lock_file
Note that the flock utility is specific to Linux, but POSIX mkdir() is atomic and could be more portable.
"${SOURCES[@]}"
POSIX shells do not support arrays. Iterating with read over a here document is more portable.
minutes=$(($minutes - 1))
POSIX is specific that the $ prefix on a variable name can be omitted in an arithmetic expression.
ECHO="/bin/echo"
Many shell scripts never use echo, and this is a good idea. 'NEVER use echo like this. According to POSIX, echo has unspecified behavior if any of its arguments contain "\" or if its first argument is "-n".' http://www.etalabs.net/sh_tricks.html
Perhaps use this instead, in a subshell to avoid stomping on variables:
myecho () ( z=''; for x; do printf "$z%s" "$x"; z=' '; done; )
#!/usr/bin/env bash and set -u are always good ideas.
There are cases where you don't want -e enabled, such as when you want to make sure your script makes the best attempt to continue operating even through unknown failures.
Using pipefail makes it more likely your script will fail unexpectedly and without a known cause. You have to check PIPELINE to see which command in a string of pipes failed and then report on it. This is often pointless, because usually just checking the output of the last pipe will tell you whether you got what you wanted.
When your script does fail unexpectedly, you'll want to re-run it with at least tracing enabled, so the third line should be something like
> Using pipefail makes it more likely your script will fail unexpectedly and without a known cause. You have to check PIPELINE to see which command in a string of pipes failed and then report on it. This is often pointless, because usually just checking the output of the last pipe will tell you whether you got what you wanted.
Really? In this script's "ps aux | grep -c rsync" for example, if "ps" fails, you'll just get 0 without the grep failing.
(Speaking of that line: chasil's completely right that it's much better to use "flock" than "ps" for locking...)
> There are cases where you don't want -e enabled, such as when you want to make sure your script makes the best attempt to continue operating even through unknown failures.
There are scripts where you don't want any failure to kill the script. You would have to wrap nearly every line in that kind of exception handling, and if you miss one, the script fails. It is much simpler to omit -e in that case
I've been doing my scripts for decades on various, numerous, Linux and UNIX systems, real and virtual machines and never used env shell nor have I seen "set -euo pipefail" and have had zero issues.
Saying "The first two lines of the script are already wrong;" is wrong. Is that better? IDK, maybe. But "#!/bin/bash" works fine.
does it matter? MacOS is using some ancient ass 3.x bash from 2007. The location of bash is the least of your worries when it comes to portability. There's no guarantee any of the commands in your script (a) exist on the system or (b) work with the same options and flags that the script uses. I don't even know how many "netcat" commands I've seen in the wild. You could be running in some BusyBox or pared down Docker container. Blindly running a script that hasn't been deliberately crafted to work on your system is just asking for trouble.
As a system eng for almost 15 years I've spent a good number of years handling deployments, dependancies, environment evaluations and all the little minutiae that's required to run large complex distributed systems at scale. This is fine to ignore if it's your personal box and no one else is working on it.
However I've seen it happen quite frequently in systems that were designed with container like 'chroot-lite' prod setups where the system bash and the deployed environment may contain different bash.
The different bash is the one that was tested in the test env with the automation. There may even be multiple different versions of an interpreter on the system with multiple app environments running.
This was a pretty common way to package apps before easy access to containers and container managers.
This is why we have industry best practices, so that people who don't understand why something exists can just follow the best practices and we don't all have to be experts in things outside of our direct field.
> This is why we have industry best practices, so that people who don't understand why something exists can just follow the best practices and we don't all have to be experts in things outside of our direct field.
We don't have industry best practices for shell scripts, no matter what consultants / HN commenters with strong opinions say. You can see in this thread that folks disagree about what the best practices are. It's worth paying attention to folks' rationale for their opinions—I learned some caveats about "-e" from following the links here. But the talk about "table stakes" and "industry best practices" is (extended bleep). Those don't exist.
Different projects/companies may have their own best practice guides that make sense in their environment. senko mentioned Google data centers as a place where it might make sense to be more rigorous. Google's guide says to use "#!/bin/bash". [1] If that doesn't work in your environment, fine, but that doesn't make them wrong.
Shell is a surprisingly and unnecessarily difficult language to write correctly. To the extent there is a best practice on it, I think it's "use a better language for anything that might become large or important". The Google style guide I linked says more or less the same thing near the beginning. The subtleties discussed in the rest give you a taste of why...
I was a sysadmin on SunOS, HP/UX and BSD systems back in the day and have been maintaining various Linux-based systems in the past few decades as well.
We're not talking about Google data center here, it's just someone's shell script.
On my laptop (admittedly, it's running Mac OS, not a linux distro), /bin/bash is version 3.2.57 released in 2007, while '/usr/bin/env bash' invokes bash 5.1.8, released in 2020.
rsnapshot[1] is what I used on FreeBSD with a snapshot. It is like a well-tested version of the author's rsync and ssh blog post. I have a blog post here[2] describing my setup. It saved my bacon multiple times. Also, rsnapshot works in pull mode, so no client is needed on my Linux/macOS desktop or server except ssh and rsync.
Tangential question: I've been using rsnapshot to backup a remote machine to my local NAS for a while now. My impression is that even for an incremental backup, everything that I want to have backed up is transferred over the wire, and it is determined locally what actually needs to be written to disk, and what can be hardlinked from a previous backup.
Is there a way to configure rsnapshot so that it only transfers the data that's actually changed?
That is the default. rsnapshot uses rsync between local and remote, and rsync uses delta encoding algo so it is speedy. You may need to tune your ssh, networking stack and rsync.
AFAIK the crucial part of this, is rsync's `--link-dest=DIR` parameter, which hard-links an unmodified file to the respective place in "DIR", rather than making a copy.
The docs for my version of rsnapshot say not to use '--link-dest' if GNU cp is available, because the script will use 'cp -l' instead when copying "daily-1" to "daily-0" (for example). This is before it will overwrite "daily-0" with new files via rsync.
Too complicated to even try to read. Rsync is great but I've switched to Borg for backups. Borg isn't perfect but it is a higher level approach to backups, as it were. Hetzner recently dropped the price of their Storage Box backup product to about 2 euro per TB per month, and Borg works nicely with it. Borg encrypts all the backup contents and conceals the metadata on the backup server, and yet you can (with the encryption passphrase) mount the backup archive as a read-only file system through FUSE and access it through normal file navigation. It is impressive.
Make sure your borg repos are copying properly and in full!
I had a horrible realisation that my borg backups were timing out on the offiste copy, meaning the resulting offsite backup I had was non-existent. The heartbreaking error message Inconsistency detected. Please run "borg check [repository]" - although likely this is "beyond repair"
Course following the 3-2-1 rule you're probably good, but aye I'm treating borg repos as delicately as I would a striped raid now. I am also monitoring that `borg check` actually comes back successfully before considering the backup complete too :)
Also if you're in a small company, start up, etc.. Set yourself a 3-2-1+ backup system. + is the miracle backup that makes you look good.
> I had a horrible realisation that my borg backups were timing out ... meaning the ... backup I had was ...
A backup that isn't tested is not a true backup, it is a disappointment waiting to be found! This can happy with any backup tool, my hand-crafted¹ rsync based scripts included.
Testing isn't hard to setup if you don't mind the final step being manual. Snapshots have a checksum file, and daily one is picked and rechecked, any difference is an indication of bit-rot on the storage medium or something accidentally getting deleted/modified otherwise. After the newest copy is created by the main backup script a list of files not touched since the backup started is made and pushed up, the backup site checks those files and sends the result back so it can be compared. Any difference in these checksums results in an email to an account that makes my phone shout in a distressed manner. The manual part here is occasionally checking the results manually because not getting an email could either mean all is well or that something has broken to the point that the checks aren't running at all².
For specific systems like my mail server I have a replica VM, not visible to the Internet at large, that wipes itself and restores from the latest off-site backup. I look at that occasionally to see that it is running and has the messages I've recently sent and received. As a bonus this VM could quickly be made to be publicly available and take over with a few firewall DNS changes, should the main mail server or its host physically die, and even if it doesn't take over its existence proves that the restore method is reliable should I need to restore the one in the primary location. Some extra automated checks could be added to this too, but again there comes a point where writing the checks takes more time than just doing that manually³ and I'd still do it manually out of paranoia anyway.
[1] If I'm honest, “string together” would be much more accurate than “crafted”
[2] I could automate that a bit too, but then that automation still needs to be verified occasionally, it quickly gets to the point where it is double-checks all the way down and making sure you check manually occasionally is far far more maintainable a system.
[3] If these were a business thing rather than personal services, then the automated procedure vs manual checks desirability balance might change somewhat.
> Live and learn.. and learn again later when you let yourself slip :)
Definitely. I'm as careful as I am due to past issues, either my own or those of others that I've witnessed. Seeing the look on someone's face when they ask “You know about these things, you can do something to get it back right? Right?!”, and having to let them down…
Borg has a command ("borg check") to test backup metadata. I'm not sure exactly what it does, but it is pretty slow, about 80 minutes to check a 1.6TB backup depending on how busy the backup server is. Another approach might be to mount the backup as a FUSE filesystem and monitor it with something like tripwire. I just thought of that a minute ago and haven't looked into it, so idk if it is workable. At the end of the day you have to occasionally do a full restore to another system, and test everything.
Fully agree, Borg is awesome. I wrote up how I'm using it here[1]. In short, borg backup to a local machine, and that machine uses rclone to copy the backups to an S3 bucket off-site. I've had multiple occasions to restore stuff successfully, and I never have to think about whether my data is retrievable.
I used to do something similarly complex to the OP with ssh and --link-dest to make it share inodes so I could easily keep N days of backups with file level deduplication [1].
Then I moved to Borg and haven't looked back. It does the same end goal in a better way, is way faster, and easier to work with overall [2].
Yes, the bigger ones are around 2 euro per TB. They also have Storage Share at around 3 euro/TB in the bigger sizes. Those have more features (they run nextcloud) and are themselves backed up several times a day.
I used rsync 20 years ago, switched to Borg, then switched to Restic. It’s nice to send backups to destinations like B2 or S3 without any special requirements. But the Hetzner Storage Box is good deal.
Now all I need is better client side software for these things, something I could use on my moms computer. Vorta is not that good, although I commend the effort.
Until the UI side is sorted, my moms laptop stays on Backblaze.
As a small film production we have approx. 120 TB to backup. Thats a lot of money per month, if you check AWS it is a fortune (half the value of a car) per month.
The script here really only makes sense for servers you physically own. You wouldn't accept SSH access from a key located on a plain VPS, right? Also this script doesn't seem to encrypt the data at all!! Very dangerous on a VPS.
> You wouldn't accept SSH access from a key located on a plain VPS, right?
With Borg, I use ssh -A from my laptop to start backups going. I can think of some other schemes like using multiple user accounts on a single Storage Box (Hetzner supports that and I believe offers an API) so that different VPS can't clobber each other's backups, that old backups become read-only, etc. It might be interesting to add some finer access controls on the server side. Borg supports an append-only mode but right now, that's only a config option rather than a security setting, I believe.
I've only recently started using Borg so I'm not really familiar with its intricacies yet. There are some things I would change but it is mostly well thought out, imho.
You can't run arbitrary shell commands on the storage box server. They have the Borg binary (1.17 last time I looked) installed on the server, and they officially support it. They seem to specially recognize the borg commands.
Using zfs snapshot is the best way to go but having a cloud destination wouldn't be easy for these tools unfortunately but there are services that accept borg and zfs send.
I like restic, but doing rsync scripts like this always win out for me because I find myself wanting to stage the backups on the destination systems, and not the source/prod ones.
I really, really wanted to love restic, but it makes far, far too many tiny size files for my main remote backup use-case: backing up dozens of TB offsite to Glacier Deep Archive.
Eagerly awaiting a configurable chunk size, if they decide to do that.
Kopia looks great! I am currently using Duplicati for my personal machines, but Mono and its dependencies have been less than reliable. borg/restic/... look good but seem like only part of a solution, having to write my own scripts and crontabs with monitoring etc on top seems counter-productive.
There's a million blog posts out there with "the perfect <whatever> setup" which is this circle jerking way of saying that its the author's preferred configuration.
I don't entirely know what the motivation is, but as a grumpy old GenXer, I wish it would stop.
You missed the word "almost" just before the word "perfect": meaning there is space for improvementes and any coding has, if you dig deeper, a never ending context. So, you see something "not so or not at all perfect" just name it to improve it. THX
It's nice that the script works both ways, but keep in mind that it's always best to pull backups from a remote system, so that an attacker in the production system can't log into the backup system and delete the backups.
If your prod system(s) is/are logging into the backup system, then you have a big problem because if any of them are compromised, they can wipe out / corrupt / etc all of their own backups, and possibly the other server's backups as well (if the same user account is used for all of them.) This problem goes away if your (isolated, hardened) backup server logs into the prod systems. Of course, the inverse problem is that if an attacker manages to get into your backup system, then they also have (at least read) access to all of your prod systems!
What seems like a long time ago, I was working at a place that used Linux, but not really any Linux experts. They had backup scripts that assumed (and did not check) that you would connect an external drive. If you didn't, the script would gladly copy files to /mnt/usbdrive, and if the server was full enough, backing up itself to itself would fill the drive and bad things would then happen.
There was no email notification if the backup completed or failed, or even started to run. There was no notification of how long it took, how much data was backed up, etc.
For rsync tasks that require scheduling and need to ensure only a single instance executes, I personally prefer creating a systemd service (type=oneshot) and have that run under a systemd timer (set OnBootSec, OnUnitActiveSec)
I didn't read the whole article but the reason I like rsync is that I get files on a normal filesystem. I don't need any special backup software to read or access the files.
I totally agree. Do not underestimate the obstacle a family member will face when he needs to open a borg or restic repository to access a backup of your family photos.
I use restic for most of my backups but hand-written rsync on an external drive with ntfs for the stuff that would be important to my family.
The backup script is much simpler, the repository is properly encrypted, takes snapshots, dedup, mounts the remote, integrates with backends, has clean output, and various useful features for working with repositories.
My go-to backup solution, that can also manage rsync over ssh (among plenty other things), is synbak[0]. Short wrapper script to automatically mount the backup medium and it's really simple to (automatically) run a backup job. I use encrypted (LUKS) USB drives for that at home. Highly recommend it.
For encrypted backups to e.g. a NAS, Duplicity[1] is my go-to choice (full backup every month or so, with incremental backup every day inbetween).
Can't help but throw a couple cents/pennies around whenever an "almost perfect" or "advanced" script makes my knee jerk so wildly. So here goes. For this part:
pro_on=$(ps aux | grep -c rsync)
# if someone is using rsync
# grep is also producing one entry so -gt 1
`grep` can be excluded from the output by putting square brackets around one character of the search pattern, like so:
Does anyone have experience with Duplicati [1] or recommend it? I am tentatively looking to make the switch to something a little more sophisticated than manually tar-balling + rsyncing backups.
I’ve that in use on my personal pc (Ubuntu) for my personal files. Bringing in the pictures via Dropbox. It works flawlessly for about 1.5 yr and 150GB to Backblaze. What put me off initially is that the most important part of backups: testing the restore process was / is less documented. Got it tested by just using common sense and a fresh temporary install. (Any pointers to how to do automatic integrity tests greatly appreciated.)
About 10 years ago, I was earning my CS degree, and my Intro to Unix teacher was showing us how to use rsync. He somehow made an error that synced an empty directory to his home directory, deleting everything from his home directory.
The lesson I learned in that class was to not use rsync, sadly.
> The lesson I learned in that class was to not use rsync, sadly.
This is a poor lesson! If your TA tripped on the stairs and broke their nose, would you swear off stairs forever?
rsync is an essential tool for anyone who works with files in the UNIX world. There are other options, sometimes, but rsync is almost-always present, and does almost-everything you usually want.
Any tool can be used incorrectly, but rsync is easy and follows the same pattern as most other UNIX file copy tools. If you can use cp, you can use rsync safely.
I think there's a bug in this; the proc_on calculation doesn't account for multiple lines, it just calculates if the output of grep is greater than 1? grep -c may not be available on all systems, so a more portable way to do this is piping to 'wc -l'
Honestly, I just use duplicity. It is easy to test backups and has a nice GUI if you're that way inclined. I'm too old for bash scripts when the stakes are a lifetime of memories or critical data at work.
I wish more of these backup solutions (even simple scripts like this) would include zero trust features.
I'd like to have a backup system that I didn't just kludge together myself over a weekend that accomplishes both delta change detection and rotation as well as allows me to keep my data encrypted with a key only I control.
Any SATA connector that is powering the drive has little pins and can be not well made and cause fire.
Yes, on the Molex site is a big amount of amperes that can go through. But you hardly find a PSU with 24 direct outputs for HDDs.
tl;dw: many of these adapters are so poorly made they might catch on fire. They put the (insulated) wires in and then pour in something kind of like hot-melt glue around them (to hold them in place, and/or for extra insulation, not sure). Sometimes the wires are well separated within that medium, but sometimes they're touching, and possibly the wire's own insulation is melted by the glue. There also seems to be some contaminants in the glue, and possibly corrosion from the way it was soldered.
At the end, he shows a better kind where crimped wires are put into hard plastic channels, which reliably keeps them in place without the same risk of compromising their insulation.
The script could use --rsync-path='sudo rsync', so that your rsync backup script can access root-owned files on the remote system. It looks like you can add that to the RSYNCCONF part of "subpart of part 2".
For security reasons the user "backup" on the linux acting backup system is not part of the sudo users. Having sudo rights will to my knowledge not help to excess root files on the remote system?
This may be enough for some users, but "--archive" does not copy all file metadata, so it may cause surprises.
rsync must be invoked with "--archive --xattrs --acls" to guarantee complete file copies.
Unfortunately all command-line utilities for file copying that are available for UNIX-like operating systems use default options that do not copy most file metadata and they usually need many non-default flags to make complete file copies.
Nevertheless, rsync is the best of all alternatives, because when invoked correctly it accurately copies all file metadata even between different file systems and different operating systems, in cases when other copying utilities may lose some file metadata without giving any errors or warnings for the users.