Duplicity: Encrypted bandwidth-efficient backup

_flux · on Jan 24, 2024

I've moved to using backup tools using content-based ids with rolling window hashes, which allows deduplicating content even between different hosts—and crucially handles moving content from one host to another efficiently—even though in other scenarios I'm guessing rdiff-algorithm can produce smaller backups.

The problem I have with duplicity and backups tools of its kind is that you still need to create a full backup again periodically, unless you want to have an ever-growing sequence of increments from the day you started doing backups.

Content-addressed backups avoid that, because all snapshots are complete (even if the backup process itself is incremental), but their content blobs are shared and eventually garbage collected when no references exist to them.

My tool of choice is kopia. Also borgbackup does similar things (though borgbackup is still unable to back up to the same repo from multiple hosts at the same time, though I haven't checked this for a while). Both do encryption, but its symmetric, so the client will have keys to opening the backups as well. If you require asymmetric encryption then these tools are not for you—though I guess this is not a technical requirement for this approach, so maybe one day a content-addressed backup tool with asymmetric encryption will appear?

AgentME · on Jan 24, 2024

Restic also works like this, and has the following benefits over Borg: multiple hosts can back up to the same repo, and it supports "dumb" remote file hosts that aren't running Borg like S3 or plain SFTP servers.

johnmaguire · on Jan 24, 2024

I really like restic, and am personally happy to use it via the command line. It's very fast and efficient! However, I do wish there was better tooling / wrappers around it. I'd love to be able to set something simple up on my partner's Macbook.

For example, Pika Backup, and Vorta are popular UIs for Borg of which no equivalent exists for Restic, while Borgmatic seems to be a de-facto standard for profile configuration.

For my own purposes, I've been using a script I found on Github[0] for a while, but it only really supports Backblaze B2 AFAIK.[1] I've been meaning to try autorestic[2] and resticprofile[3] as they are potentially more flexible than the script I'm currently using but the fact that there are so many competing tools - many of which are no longer maintained - makes it difficult to choose a specific one.

Prestic[4] looks intriguing for my partner's use, although it seems to have very few users. :\ A fork of Vorta[5] seems to have fizzled out six years ago.

[0] https://github.com/erikw/restic-automatic-backup-scheduler

[1] https://github.com/erikw/restic-automatic-backup-scheduler/i...

[2] https://github.com/cupcakearmy/autorestic

[3] https://github.com/creativeprojects/resticprofile

[4] https://github.com/ducalex/prestic

[5] https://github.com/Mebus/restatic

devnull212 · on Jan 24, 2024

To add to this list

scheduling+browsing https://forum.restic.net/t/backrest-a-cross-platform-backup-... (Golang webUI supporting Linux and MacOS).

A number of others are findable through the community section of the forum.

Bit of a self plug, I author Backrest. The most significant challenge historically has been that restic has poor programatic interfaces but in recent revisions the JSON API (over stdout) has largely stabilized for most commands.

e12e · on Jan 24, 2024

> For example, Pika Backup, and Vorta are popular UIs for Borg of which no equivalent exists for Restic

Have you considered?:

https://github.com/netinvent/npbackup

Or (not FOSS, but restore-compatible):

https://relicabackup.com/features

johnmaguire · on Jan 24, 2024

Thanks - last time I looked at npbackup it didn't support macOS! I'll have her give it a go.

Relica looks neat, but at that point I'd either suggest she uses one of the Borg tools or write a simple wrapper for her to trigger backups instead.

edit: Still looks a bit hairy for an average user to install currently, and the maintainer writes "I'm not planning on full macos support since I don't own any mac" - https://github.com/netinvent/npbackup/issues/28

riedel · on Jan 25, 2024

I would recommend kopia, which afaik has a similar feature set approach and a very easy to use gui. I recently used it to convert my 10 year old backups into one store. Dedup worked great.

JoshTriplett · on Jan 24, 2024

I'm a huge fan of restic as well. My only complaint is performance and memory usage. I'm looking forward to being able to use Rustic: https://rustic.cli.rs/

est31 · on Jan 25, 2024

> The operations are robustly designed and can be safely aborted and efficiently resumed.

This is great, when you do your first restic backup on a machine it uploads all the data which takes a long time and if there is the tiniest interruption (like computer going to sleep) then you have to start from zero again, at least that's the experience I had. Instead I went via excluding the biggest directories and then removing them from the exclusion list one by one, doing backup runs in between.

jszymborski · on Jan 25, 2024

Can someone clarify if Restic dedups and compresses encrypted repos like Borg does? I feel like at some point it couldn't but maybe read that it now can?

I ask because my Borg repo is an order of magnitude smaller because of dedup, so it's essential for me.

basilgohar · on Jan 25, 2024

Yes, it does. You can even convert old repos (the docs mention how to do this) though you'd better set aside some time for that.

I use it weekly for system as well media backup (yt-dlp for some YouTube content as a hedge in case the channel is ever unavailable in the future).

GTP · on Jan 25, 2024

> their content blobs are shared

Doesn't this increase the chance of data loss? If a blob gets corrupted, then all the backups referencing that blob will have the same corrupted file(s). This is similar to having a corrupted index in an incremental backup chain (or maybe in this case you would lose everything?), but in the case of incremental backups the risk is mitigated by periodically performing full backups. Also my gut feeling is that you will save space with content-addressed backups only if you're backing up multiple machines that share files, but in the tipical average user scenario where one is backing up a single PC you get a similar space usage. Keep in mind that you tipically delete bacups older than a certain threshold. Could you maybe comment on my points?

sliken · on Jan 25, 2024

Sure you could have multiple level-0 backups to increase the odds of whatever blob you have corrupted can be found in other copies, but that's inefficient.

It's much more efficient to deduplicate, then add redundancy. Like say storing said blobs on a RAIDz3. Or use backblaze's approach and split the blob into 17 pieces, add 3 pieces of redundancy, and distribute the chunks across 20 racks.

If you are serious of course you'd have an onsite backup, deduplicated, with added redundancy AND the same offsite.

Matumio · on Feb 3, 2024

Most backup tools allow you to verify a random subset, say 0.1% of the backup. If you do that together with every backup you would eventually notice.

rkagerer · on Jan 24, 2024

For those of us who prefer not to ship to the cloud, have you used Kopia Repository Server and is it any good? Does it run on Windows?

The documentation refers to files and directories. Does the software let you take a consistent, point-in-time snapshot of a whole drive (or even multiple volumes), e.g. using something like VSS? Or if you want that have you got to use other software (like Macrium Reflect) to produce a file?

Where does the client cache your encryption password/key? (or do you have to enter it each session)

CrendKing · on Jan 25, 2024

I wrote this part of the documentation: https://kopia.io/docs/advanced/actions/#windows-shadow-copy

If you have problem using it, please let me know.

rkagerer · on Jan 25, 2024

Neat, thanks!

_flux · on Jan 25, 2024

> For those of us who prefer not to ship to the cloud, have you used Kopia Repository Server and is it any good? Does it run on Windows?

I haven't. I use local Ceph S3 for backups, and then use kopia to mirror that to one a local RAID just in case my Ceph dies ;-).

It stores the password, base64-encoded, to ~/.config/kopia/repository.config.kopia-password. I suppose it would be nice, at least for workstations, if it supported keyrings—and it might, I haven't looked into it.

Linux-Fan · on Jan 24, 2024

> Content-addressed backups sound something like how git stores data, is that the best way to think about them?

I think it is a valid way to consider them. Another option is to think of the backup as a special kind of file system snapshot that manifests itself as real files as opposed to data on a block device.

> And if so, what would be the main differences between just committing to a git repo for example?

The main difference is that good backup tools allow you to delete backups and free up the space whereas git is not really designed for this.

nijave · on Jan 24, 2024

Also git is not good with binary and sort of bolts on git-lfs as a workaround

nijave · on Jan 24, 2024

>borgbackup is still unable to back up to the same repo from multiple hosts at the same time,

Basically still an issue. The machine takes an exclusive lock and it also adds override since each machine has to update it's local data cache (or whatever it's called) because they're constantly getting out of sync when another machine backs up

bupstash looks promising as a close-to-but-more-performant borg alternative but it's still basically alpha quality

It's unfortunate peer-to-peer Crashplan died

nh2 · on Jan 24, 2024

What is alpha quality about bupstash?

It has less features than Kopia, but what's there looks like high-quality to me.

(I'm also using it to back up 150 TB (300 million files), on which all other dedup programs run out of memory.)

nijave · on Jan 29, 2024

I guess the docs call it "beta" so maybe that's more accurate. Last I looked a couple months ago there's still a fair bit of corruption issues cropping up

EuAndreh · on Jan 24, 2024

> though borgbackup is still unable to back up to the same repo from multiple hosts at the same time

Wouldn't that mean that, when using encrypted backups, secrets would have to be shared across multiple clients?

If I'm understanding it correctly, it sounds like an anti-feature. Do other backup tools do that?

_flux · on Jan 24, 2024

Yes, it seems to be the case; only the data in the server is encrypted, while the key is shared between clients sharing the same repository.

I'm not sure if content-addressed storage is feasible to implement otherwise. Maybe use the hash of the the unencrypted or shared-key-encrypted as the key, and then encrypt the per-block keys with keys of the clients who have the contents would do it. In any case, I'm not aware of such backup tools (I imagine most just don't encrypt anything).

formerly_proven · on Jan 24, 2024

CAS-based backup tools leak metadata like a sieve, so they're generally not the best choice for the most paranoid people, which should probably stick to uncompressed tar archives (or zips, which avoid compressing unrelated files together, which leaks data) padded to full 100 megs or so and then encrypted en bloc.

mizzao · on Jan 24, 2024

Content-addressed backups sound something like how git stores data, is that the best way to think about them?

And if so, what would be the main differences between just committing to a git repo for example?

KMag · on Jan 24, 2024

The "rolling window hashes" from the comment suggests sub-file matching at any offset. (See Bently-McIlroy diff algo/how rsync efficiently finds matches, for example.) I'm not aware that git performs this sort of deduplication.

Better yet would be to use a rolling hash to decide where to cut the blocks, and then use a locality-aware hash (SimHash, etc.) to find similar blocks. Perform a topological sort to decide which blocks to store as diffs of others.

Microsoft had some enterprise product that performed distribution somewhat like this, but also recursively using similarity hashes to see if the diffs were similar to existing files on the far machine.

adrianmonk · on Jan 25, 2024

Git does delta compression on packfiles, and if the same data occurs in different in different files, it actually can deduplicate it, even if it's not at the same offset.

Here's a demo.

First, create two test files. The files both contain the same two 1-megabyte chunks of random bytes, but in the opposite order:

    $ openssl rand 1000000 > a
    $ openssl rand 1000000 > b
    $ cat a b > ab
    $ cat b a > ba
    $ du -sh ab ba
    2.0M        ab
    2.0M        ba

Commit them to Git and see that it requires 4 MB to store these two 2 MB files:

    $ git init
    Initialized empty Git repository in /tmp/x/.git/
    $ git add ab ba
    $ git commit -m 'add two files'
    [master (root-commit) 7f75af0] add two files
     2 files changed, 0 insertions(+), 0 deletions(-)
     create mode 100644 ab
     create mode 100644 ba
    $ du -sh .git
    4.0M        .git

Run garbage collection, which creates a packfile. Note "delta 1" and note that disk usage dropped to a bit over the size of one of the files.

    $ git gc
    Enumerating objects: 4, done.
    Counting objects: 100% (4/4), done.
    Delta compression using up to 6 threads
    Compressing objects: 100% (4/4), done.
    Writing objects: 100% (4/4), done.
    Total 4 (delta 1), reused 0 (delta 0), pack-reused 0
    $ du -sh .git
    2.1M        .git

I'm not sure as if it's as sophisticated as some backup tools, though.

jiggawatts · on Jan 25, 2024

Microsoft's implementation is called Remote Differential Compression: https://learn.microsoft.com/en-us/previous-versions/windows/...

It's available as a built-in component of Windows, it's just a library with an API.

Essentially the MS RDC protocol is just rsync run twice in a row, with the rsync metadata copied via rsync to compress it further.

KMag · on Jan 25, 2024

> Essentially the MS RDC protocol is just rsync run twice in a row, with the rsync metadata copied via rsync to compress it further.

There's an important difference is that RDC uses a locality-sensive hash algorithm (MinHash, IIRC) to find files likely to have matching sections, whereas rsync only considers the version of the same file sitting on the far host. rsync encodes differences on each file in isolation, whereas RDC looks at the entire corpus of files on the volume.

For example, if you do the Windows equivalent of cat local/b.txt >> local/a.txt, rsync is going to miss the opportunity to encode local/a.txt -> remote/a.txt using matching runs from a common local/b.txt and remote/b.txt. However, RDC has the opportunity to notice that the local/a.txt -> remote/a.txt diff is very similar to remote/b.txt and further delta-encode the diff as a diff against remote/b.txt.

totetsu · on Jan 24, 2024

Could this help dedupe 20 years of ad-hoc drive dumps from Changing systems…

longwave · on Jan 24, 2024

I used this many, many years ago but switched to Borg[0] about five years ago. Duplicity required full backups with incremental deltas, which meant my backups ended up taking too long and using too much disk space. Borg lets you prune older backups at will, because of chunk tracking and deduplication there is no such thing as an incremental backup.

[0] https://www.borgbackup.org/

tussa · on Jan 24, 2024

I did the same. I had some weird path issues with Duplicity.

Borg is now my holy backup grail. Wish I could backup incrementally to AWS glacier storage but that just me sounding like an ungrateful begger. I'm incredibly grateful and happy with Borg!

sigio · on Jan 24, 2024

Agree completely... used duplicity many years ago, but switched to Borg and never looked back. Currently doing borg-backups of quite a lot of systems, many every 6 hours, and some, like my main shell-host every 2 hours.

It's quick, tiny and easy... and restores are the easiest, just mount the backup, browse the snapshot, and copy files where needed.

stavros · on Jan 24, 2024

After Borg, I switched to Restic:

https://restic.net/

AFAIK, the only difference is that Restic doesn't require Restic installed on the remote server, so you can efficiently backup to things like S3 or FTP. Other than that, both are fantastic.

pdimitar · on Jan 24, 2024

Technically Borg doesn't require it either, you can backup to a local directory and then use `rclone` to upload the repo wherever.

Not practical for huge backups but it works for me as I'm backing up my machines configuration and code directories only. ~60MB, and that includes a lot of code and some data (SQL, JSON et. al.)

stavros · on Jan 24, 2024

Sure, but rsync requires the server to support rsync.

pdimitar · on Jan 24, 2024

Does rclone require rsync? Haven't checked.

stavros · on Jan 24, 2024

Oh sorry, brain fart, I thought you said rsync. I think rclone uploads everything if you don't have rclone on the server, but I'm not sure.

pdimitar · on Jan 24, 2024

Pretty sure rclone uploads just fine without server dependencies, yeah. I never installed anything special on my home NAS and it happily accepts uploads with rclone.

stavros · on Jan 24, 2024

It will upload fine, but it can't upload only the changed parts of the file without server support.

pdimitar · on Jan 24, 2024

That I can't really speak of. I know it does not reupload the same files at least (uses timestamps) but never really checked about only uploading file diffs.

Do you have a direct link I can look at?

stavros · on Jan 24, 2024

Nothing offhand, but basically it can't know what's on the server without reading it all, and if it can't do that locally, it'll have to do it remotely. At that point, might as well re-upload the whole thing.

Its front page hints at this, but there must be details somewhere.

abadpoli · on Jan 25, 2024

I think you’re misunderstanding something. There’s no need, and even no possibility, to have “rclone support” on the server, and also no need to “read it all”. rclone uses the features of whatever storage backend you’re using; if you back up to S3, it uses the content hashes, tags, and timestamps that it gets from bucket List requests, which is the same way that Restic works.

Borg does have the option to run both a client-side and a server-side process if you’re backing up to a remote server over SSH, but it’s entirely optional.

stavros · on Jan 25, 2024

Ah, you're right, I got confused between rsync and rclone's server-side transfers.

pdimitar · on Jan 24, 2024

Not to make this an endless thread, but I have been wondering about what's the most rsync-friendly backup on-disk layout. I have found Borg to have less files and directories which I would naively think translates to less checks (and the files are not huge, too). I have tried Kopia and Bupstash as well but they both produce a lot of files and directories, much more than Borg. So I think Borg wins at this but I haven't checked Restic and the various Duplic[ati|icity|whatever-else] programs in a while (last I did at least a year ago).

stavros · on Jan 24, 2024

I think the advantage of restic is that you don't need to rsync afterwards, it handles all that for you. Combined with its FUSE backup decryption (it mounts the remote backup as a local filesystem you can restore files from), it's very set-and-forget.

pdimitar · on Jan 24, 2024

My problem with Restic was that it did not recognize sub-second timestamps of files. I made test scripts that tested it (and were creating files and directories in a hypothetical backup source, and were also changing the files) but then Restic insisted nothing was changed because the changes were happening too fast.

I modified the scripts to do `sleep 1` between each change but it left a sour taste and I never gave Restic a fair chance. I see a good amount of praise in this thread, I'll definitely revisit it when I get a little free time and energy.

Because yeah, it's not expected you'll make a second backup snapshots <1s after the first one. :D

stavros · on Jan 25, 2024

I'm going to say that was a bit of a niche usage :P

pdimitar · on Jan 30, 2024

I tried Restic again but, its repo size is 2x of that of Borg which allows you to fine-tune compression, and Restic doesn't.

So I'll keep an eye on Rustic instead (it is much faster on some hot paths + allows you to specify base path of the backup; long story but I need that feature a lot because I also do copies of my stuff to network disks and when you backup from there you want to rewrite the path inside the backup snapshot).

Rustic compresses equivalently to Borg which is not a surprise because both use zstd on the same compression level.

formerly_proven · on Jan 25, 2024

For the "replicating borg repos" use case this doesn't matter, because files are only written once and never modified afterwards.

shaunpud · on Jan 25, 2024

Works a treat with borgmatic https://torsion.org/borgmatic

zzzeek · on Jan 24, 2024

I have an overnight cron that flattens my duplicity backups from many incremental backups made over the course of one day to a single full backup file, that becomes the new backup. then subsequent backups over the course of the day do incremental on that file. So I always have full backups for each individual day with only a dozen or so incremental backups tacked onto it.

that said will give Borg a look

giamma · on Jan 24, 2024

Same for me. Also, on MacOs duplicity was consuming much more CPU than Borg and was causing my fan to spin loudly. Eventually I moved to timemachine, but I still consider Borg a very good option.

GTP · on Jan 25, 2024

Also duplicity let's you automatically delete backups older than a certain amount of time, what is the difference?

tzs · on Jan 24, 2024

Not to be confused with Duplicati [1] or Duplicacy [2]. There are too many backup programs whose names start with 'Duplic'.

[1] https://www.duplicati.com/

[2] https://duplicacy.com/

jszymborski · on Jan 24, 2024

While we're on the topic of Duplicati, I feel the need to share my personal experience; one that's echoed by lots of folks online.

Duplicati restores can take what seems like the heat death of the universe to restore a repo as little as 500Gb. I've lost a laptop worth of files to it. You can find tonnes of posts on the Duplicati forums which retell the same story [0].

I've moved to Borg and backing up to a Hetzner Storage Box. I've restored many times with no issue.

Remember folks, test your backups.

[0] https://forum.duplicati.com/t/several-days-and-no-restore-fe...

flagged24 · on Jan 25, 2024

Same story, Hetzner storage box, local backups and at times a backup to my Android phone. Automated backup testing and a notification on Telegram if anything is amiss.

johnchristopher · on Jan 24, 2024

> Remember folks, test your backups.

Since you mention it, I am seizing the opportunity to ask: how should borg backup be tested ? Can it be automated ?

jszymborski · on Jan 24, 2024

It's actually pretty simple using the check command [0]!

  borg check --verify-data REPOSITORY_OR_ARCHIVE

You can add that to a cron job.

Alternatively, I think the Vorta GUI also has a way to easily schedule it[1].

I'll add that one thing I like to do once in a blue-moon is to spin-up a VM and try to recover a few random files. While the check command checks that the data is there and theoretically recoverable, nothing really beats proving to yourself that you can, in a clean environment, recover your files.

[0] https://borgbackup.readthedocs.io/en/stable/usage/check.html

[1] https://vorta.borgbase.com/

johnchristopher · on Jan 28, 2024

> It's actually pretty simple using the check command [0]!

> borg check --verify-data REPOSITORY_OR_ARCHIVE

Thanks ! I thought there were some more convoluted process but I couldn't picture out anything except extracting the whole archives and check up by hand.

pdimitar · on Jan 24, 2024

Yes, I have a bunch of scripts that allow me to pick a Borg snapshot and then do stuff with it. One such action is `borg export-tar` that just creates a TAR file containing a full self-sufficient snapshot of your stuff.

Then just listing the files in the archive is a not-bad way to find an obvious problem. Or straight up unpacking it.

But if you're asking about a separate parity file that can be used to check and correct errors -- I haven't done that.

rabbitofdeath · on Jan 24, 2024

Duplicacy for me has been amazing - I use it to backup all of my machines nightly all consolidated into 1 repo that is copied to B2 and it works amazingly. I've restored plenty and have not had any issues.

russelg · on Jan 25, 2024

I'm curious as to why you took that approach. Why not just straight to B2 from each machine? Is it for a redundant local copy of all the backups? If so that sounds like a good idea since restoring from B2 takes ages just because listing the revisions is hella slow for me...

rabbitofdeath · on Jan 26, 2024

I like the one-repository approach as it works well - it resides on my NAS which is always on and it is a local backup for multiple machines. From there, I replicate to B2 as an emergency copy in the event my NAS (RAID is not backup!) dies or something horrific. I tried to make it future-proof as much as possible too - with Duplicacy, you can easily clone/copy repositories so new hardware migration will be extremely simple. Not only simple with local hardware, but easy to migrate to a different cloud storage as well. Just don't lose your repository encryption key!

darrmit · on Jan 24, 2024

I've found restic + rclone to be extremely stable and reliable for this same sort of differential backup. I backup to Backblaze B2 and have also used Google Drive with success, even for 1TB+ of data.

muppetman · on Jan 24, 2024

I agree. restic with it's simple ability to mount (using fusefs) a backup so you can just copy out the file(s) you need, is so wonderful. A single binary that you just download and can SCP around the place to any device etc.

It's fantastic to have so many great open source backup solutions. I investigated many and settled on restic. It still brings me joy to actually use it, it's so simple and hassle free.

tmalsburg2 · on Jan 24, 2024

+1 for restic. I tried various solutions and restic is the best by far. So fast, so reliable.

https://restic.net/

monkey26 · on Jan 24, 2024

I've been using Restic since 2017 without issue. Tried Kopia for a while, but its backup size ballooned on me, maybe it wasn't quite ready.

pdimitar · on Jan 24, 2024

Kopia supports zstd compression, I found it to be within +10% to -5% of the size of my Borg repo.

It also has extensive support for ignoring stuff and it works very well.

I still use Borg because its policy of expiring older snapshots is more useful for me, but Kopia is extremely solid and I would use it any day if I didn't care that it doesn't actually keep one monthly backup for the last 3 months as Borg does (it decides which older snapshots to keep with another algorithm; it's documented on their website).

baal80spam · on Jan 24, 2024

+1 for rclone. What a great piece of software.

jwr · on Jan 24, 2024

Excellent piece of software, and relatively simple to use with gpg encryption. I've been using it for many years.

My only complaint is that, like a lot of software written in Python, it has no regard for traditional UNIX behavior (keep quiet unless you have something meaningful to say), so I have to live with cron reporting stuff like:

"/usr/lib/python2.7/dist-packages/paramiko/rsakey.py:99: DeprecationWarning: signer and verifier have been deprecated. Please use sign and verify instead. algorithm=hashes.SHA1()"

along with stuff I actually do (or might) care about.

Oh well.

duskwuff · on Jan 25, 2024

> /usr/lib/python2.7/

You are using an old version of Duplicity. It dropped all support for Python 2 in 2022: https://git.launchpad.net/duplicity/commit/setup.py?id=5505f...

tynorf · on Jan 24, 2024

You can try setting `export PYTHONWARNINGS="ignore"` to suppress warnings.

dveeden2 · on Jan 24, 2024

https://apps.gnome.org/DejaDup/ is using this as backend. It also has a experimental option to use https://github.com/restic/restic instead of duplicity.

thedanbob · on Jan 24, 2024

PSA for anyone else as stupid as me: when doing selective restores be very careful about how you set target_directory. "duplicity restore --path-to-restore some_file source_url ~" does not mean "restore some_file to my home directory", it means "replace my home directory with some_file".

kiwijamo · on Jan 25, 2024

I usually make a folder specifically for restores and target anything there -- this avoids this issue.

MallocVoidstar · on Jan 24, 2024

I've had issues with Duplicity not performing backups when a single backend is down (despite it being set to continue on error). In my case an SFTP server lost its IPv6 routing so Duplicity couldn't connect to it; rather than give up on that backend and only use the other server it just gave up entirely.

diekhans · on Jan 24, 2024

I have been having great luck with incremental backups with the very similar named Duplicacy https://duplicacy.com/

65 · on Jan 24, 2024

If you're using S3 to back up your files, it's easier to write a shell script with the AWS CLI. For example here's a script I wrote that I run automatically to back up my computer to S3. I have an exclude array to exclude certain folders. It's simpler than downloading software and more customizable.

# $1 # local folder

# $2 # bucket

declare -a exclude=(

  "node_modules" 

  "Applications"

  "Public"

)

args=""

for item in "${exclude[@]}";

do

  args+=" --exclude '*/$item/*' --exclude '$item/*'";

done

cmd="aws s3 sync '$1' 's3://$2$1' --include '*' $args"

eval "$cmd"

rakoo · on Jan 24, 2024

Your script doesn't do the same thing as duplicity. Your script mirrors the local directory with your bucket. It loses all history. Duplicity does backups (ie with history) but not just that, it does differential backups to not upload everything all the time.

65 · on Jan 24, 2024

S3 has bucket versioning if you want to have multiple backups. The S3 sync command also does differential backups; if you for example try to run the script over and over it will only upload new/different files.

res0nat0r · on Jan 24, 2024

The major issue out of the box vs any deduping backup software is that S3 doesnt support any deduplication. If you move or rename a 15GB file you're going to have to completely upload it again and also store a second copy and pay for it until your S3 bucket policy purges the previously uploaded file you've deleted. Also aws s3 sync is much slower since it has to iterate over all of the files to see if their size/timestamp has changed. Something like borgbackup is much faster as it uses smarter caching to skip unchanged directories etc.

65 · on Jan 24, 2024

It's possible to find probable duplicate files with the S3 CLI based on size and tags - I was working a script to do just that but I haven't finished it yet. Alternatively if you want exact backups of your computer you can use the --delete flag which will delete files in the bucket that aren't in the source.

I agree this is not the absolute most optimized solution but it does work quite well for me and is easily extendible with other scripts and S3 CLI commands. Theoretically if Borgbackup or Duplicity are backing up to S3 they're using all the same commands as the S3 CLI/SDK.

Besides, shell scripting is fun!

duskwuff · on Jan 25, 2024

> Theoretically if Borgbackup or Duplicity are backing up to S3 they're using all the same commands as the S3 CLI/SDK.

They are not. Both Borg and Duplicity pack files into compressed, encrypted archives before uploading them to S3; "s3 sync" literally just uploads each file as an object with no additional processing.

sevg · on Jan 24, 2024

If I have to choose between hacking together a bunch of shell scripts to do my deduplicated, end-to-end encrypted backups, vs using a popular open source well-tested off the shelf solution, I know which one I'm picking!

rakoo · on Jan 25, 2024

That's not differential backups, that's optimizing the mirroring algorithm.

Differential backup here means that if a file has changed you only send the change delta, not the whole file. This is what makes it possible tobrun that kind of things every hour if needed even on large folders.

If s3 supported that and with the already existing versioning you'd have a pretty kickass solution; that's basically what you can do with rsync and a zfs filesystem for example

Scarbutt · on Jan 24, 2024

no encryption though

65 · on Jan 24, 2024

You can encrypt S3 buckets/files inside your buckets. By default buckets are encrypted.

sevg · on Jan 24, 2024

Not end-to-end encryption.

AdmiralAsshat · on Jan 24, 2024

Brilliant name, if you think about it. If they ever decided to start doing shady shit, they'd have a perfect legal shield. No one would be able to convincingly argue in court that they were being duplicitous.

mlyle · on Jan 24, 2024

If no one can argue they're duplicitous, then it's a case of false advertising...

BeetleB · on Jan 24, 2024

Duply is a good frontend for Duplicity.

https://duply.net/Main_Page

igtztorrero · on Jan 24, 2024

Try kopia.io is very good

green-salt · on Jan 24, 2024

Seconding this, its saved me several times.

mrich · on Jan 24, 2024

If you don't need incremental backups (thus saving space for the signatures) and want to store to S3 Deep Glacier, take a look at https://github.com/mrichtarsky/glacier_deep_archive_backup

lgunsch · on Jan 24, 2024

Years ago I used a very simple bash script font-end for Duplicity called Duply. It worked very well for the half-dozen years or so I used it.

pengaru · on Jan 25, 2024

Whatever you do, don't use rdiff-backup.

sliken · on Jan 25, 2024

Why? I've had no issues.

pengaru · on Jan 26, 2024

For starters it has a tendency to paint itself into a corner on ENOSPC situations. You won't even be able to perform a restore if a backup was started but unfinished because it ran out of space. There's this process of "regressing" the repo [0] which must occur before you can do practically anything after an interrupted/failed backup. What this actually must do is undo the partial forward progress, by performing what's effectively a restore of the files that got pushed into the future relative to the rest of the repository, which requires more space for any actually modified files. Unless you have/can create free space to do these things, it can become wedged... and if it's a dedicated backup system where you've intentionally filled disks up with restore points, you can find yourself having to throw out backups just to make things functional again.

That's the most obvious glaring problem, beyond that it's just kind of garbage in terms of the amount of space and time it requires to perform restores. Especially restores of files having many reverse-differential increments leading back to the desired restore point. It can require ~2X a given file's size in spare space to assemble the desired version, while it iteratively reconstructs all the intermediate versions in arriving at the desired version. Unless someone improved this since I last had to deal with it, which is possible, it's been years.

Source: Ages ago I worked for a startup[1] that shipped a backup appliance originally implemented by contractors using rdiff-backup behind the scenes. Writing a replacement that didn't suck but was compatible with rdiff-backup's repos while adding newfangled stuff like transactional backups with no need for "regress", direct read-only FUSE access of restore points without needing space, and synthetic virtual-NTFS style access for booting VMs off restore points consumed several years of my life...

There are far better options in 2024.

[0] https://github.com/rdiff-backup/rdiff-backup/blob/master/src...

[1] https://www.crunchbase.com/organization/axcient