Where I am we have a similar setup for leader election and failover (using etcd and haproxy) but we add an additional step: a standby instance that does not participate in master election, and always follows the elected master.
Then we turn on confirmed writes on the master so that the non-participating standby (called the "seed") has to receive and confirm your write before the transaction can commit.
This has the bonus of preventing split brain... If the wrong instance thinks it's master, writes will block indefinitely because the seed isn't confirming them. If the seed is following the wrong machine, same thing. And if clients and the seed and the master are all "wrong", then that's ok because at least they all "consistently" disagree with etcd.
The seed instance can run anywhere, and is responsible for receiving WAL snapshots from the master and archiving them (to shared storage) so it can crash too and be brought up elsewhere and catch up fine. The writes just block until this converges.
It's worked quite well for us for a few months on a hundred or so Postgres clusters, we haven't seen an issue yet. I'd love for somebody knowledgeable about this stuff to point out any flaws.
That's interesting. We do something pretty similar in the Manatee component that I mentioned elsewhere in this thread, except that the designated synchronous standby can takeover if the primary goes away. But it can only do so when another peer is around to become the new synchronous standby, so we maintain the write-blocking behavior that avoids split-brain.
Then we turn on confirmed writes on the master so that the non-participating standby (called the "seed") has to receive and confirm your write before the transaction can commit.
This has the bonus of preventing split brain... If the wrong instance thinks it's master, writes will block indefinitely because the seed isn't confirming them. If the seed is following the wrong machine, same thing. And if clients and the seed and the master are all "wrong", then that's ok because at least they all "consistently" disagree with etcd.
The seed instance can run anywhere, and is responsible for receiving WAL snapshots from the master and archiving them (to shared storage) so it can crash too and be brought up elsewhere and catch up fine. The writes just block until this converges.
It's worked quite well for us for a few months on a hundred or so Postgres clusters, we haven't seen an issue yet. I'd love for somebody knowledgeable about this stuff to point out any flaws.