Hacker News new | past | comments | ask | show | jobs | submit login
PGHoard – PostgreSQL backup and restore service for cloud object storages (github.com/aiven)
122 points by bjoko on March 1, 2019 | hide | past | favorite | 15 comments



One of the pghoard developers here. We developed pghoard for our use case (https://aiven.io):

* Optimizing for roll forward upgrades in a fully automated cloud environment * Streaming: encryption and compression on the fly for the backup streams without creating temp files on disk * Solid object storage support (AWS/GCP/Azure) * Survive over various glitches like faulty networks, processes getting restarted, etc.

Restore speed is very important for us and pghoard is pretty nice in that respect, e.g. 2.5 terabytes restored from an S3 bucket to an AWS i3.8xlarge in half an hour (1.5 gigabytes per second avg). This means hitting all of cpu/disk/network very hard, but at restore time there's not typically much else to do with them.


It would be good for the PG community to consolidate some of these projects (wall-e, wall-g, pghoard, pgbackrest, barman, etc) into the core functionality at this point.


One of the PGHoard authors will be talking about PostgreSQL backups in the cloud in a couple of weeks in PostgresConf NYC: https://postgresconf.org/conferences/2019/program/proposals/...


How does a 'periodic backup using pg_basebackup' compare to barman's rsync and reuse_backup = link for speed of backups, and size of files?

I want to look at this (and Wal-G) but am not sure how much of a load I would be putting on our db servers when they do the periodic backups. Our database is pretty heavy on 'history' tables that don't change much once they are written to.


How does this compare to pgBackRest? https://pgbackrest.org/


How’s the cpu usage for this when compressing? Are the libs C libraries? (ie native performance)


CPU usage varies based the selected compression algorithm and level used. Snappy and LZMA area available now. Compression is native code. There are some newer interesting algorithms (zstd/lz4) that we are looking into adding.


I'm going to look at this more later, but my first thought is how does this compare to WAL-e https://github.com/wal-e/wal-e


[WAL-G](https://github.com/wal-g/wal-g) is another alternative, and is sold as a successor to WAL-E, just in case if you haven't heard of it :-)


There has been a discussion about WAL-G just a few days ago here on HN https://news.ycombinator.com/item?id=19259099


In my experience, WAL-E is very brittle and fails with weird errors at times such as when your database restarts, or when it fails to upload a segment.

I have gone back to using pg_dump even though its not a real-time snapshot. In the end, replication + pg_dump gets the job done


I added a pager duty alert to my wall-e when backups stop. Happen once or so a month. Not ideal but I just poke it with a stick and it comes back.


I switched from wal-e to pgbackrest a couple years ago. Totally worth it.


With wal-e, wal-g and pghoard its getting really difficult choosing a solution, they are so comparable


Feature sets of all the recent backup and restore systems are becoming more and more alike, but when we started working on PGHoard there were no good options that were built to efficiently utilize different cloud object stores (S3 + GCS, Azure, Swift, etc.)

Our original announcement of PGHoard at https://aiven.io/blog/postgresql-cloud-backups-with-pghoard/ lists some of the reasons we had for building a new system from scratch.

Nowadays there are many good options for handling basebackups and WAL, and one of the largest remaining issues is the lack of parallel WAL apply in PostgreSQL itself, which limits restore throughput quite severely.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: