One of the pghoard developers here. We developed pghoard for our use case (https://aiven.io):
* Optimizing for roll forward upgrades in a fully automated cloud environment
* Streaming: encryption and compression on the fly for the backup streams without creating temp files on disk
* Solid object storage support (AWS/GCP/Azure)
* Survive over various glitches like faulty networks, processes getting restarted, etc.
Restore speed is very important for us and pghoard is pretty nice in that respect, e.g. 2.5 terabytes restored from an S3 bucket to an AWS i3.8xlarge in half an hour (1.5 gigabytes per second avg). This means hitting all of cpu/disk/network very hard, but at restore time there's not typically much else to do with them.
It would be good for the PG community to consolidate some of these projects (wall-e, wall-g, pghoard, pgbackrest, barman, etc) into the core functionality at this point.
How does a 'periodic backup using pg_basebackup' compare to barman's rsync and reuse_backup = link for speed of backups, and size of files?
I want to look at this (and Wal-G) but am not sure how much of a load I would be putting on our db servers when they do the periodic backups. Our database is pretty heavy on 'history' tables that don't change much once they are written to.
CPU usage varies based the selected compression algorithm and level used. Snappy and LZMA area available now. Compression is native code. There are some newer interesting algorithms (zstd/lz4) that we are looking into adding.
In my experience, WAL-E is very brittle and fails with weird errors at times such as when your database restarts, or when it fails to upload a segment.
I have gone back to using pg_dump even though its not a real-time snapshot. In the end, replication + pg_dump gets the job done
Feature sets of all the recent backup and restore systems are becoming more and more alike, but when we started working on PGHoard there were no good options that were built to efficiently utilize different cloud object stores (S3 + GCS, Azure, Swift, etc.)
Nowadays there are many good options for handling basebackups and WAL, and one of the largest remaining issues is the lack of parallel WAL apply in PostgreSQL itself, which limits restore throughput quite severely.
* Optimizing for roll forward upgrades in a fully automated cloud environment * Streaming: encryption and compression on the fly for the backup streams without creating temp files on disk * Solid object storage support (AWS/GCP/Azure) * Survive over various glitches like faulty networks, processes getting restarted, etc.
Restore speed is very important for us and pghoard is pretty nice in that respect, e.g. 2.5 terabytes restored from an S3 bucket to an AWS i3.8xlarge in half an hour (1.5 gigabytes per second avg). This means hitting all of cpu/disk/network very hard, but at restore time there's not typically much else to do with them.