Automatic failover -- https://dev.mysql.com/doc/mysql-utilities/1.5/en/mysqlfail...

quinthar · on Oct 19, 2016

Cool, thanks! It looks like that came out in 2012? Neat, I'll take a look!

quinthar · on Oct 19, 2016

Incidentally, if anybody has experience with this I'd love to know:

1) What happens if "mysqlfailover" itself dies? For example, if I have 6 servers split equally between 2 datacenters, and the datacenter running mysqlfailover loses power -- how does the other datacenter get reconfigured?

2) If you run two copies of mysqlfailover (one in each datacenter), how does it solve the "split brain" problem? If you have 6 servers split equally between two datacenters, and a clean network severance down the middle, what prevents both sides from configuring and operating a master? (This is the nightmare scenario.)

3) If mysqlfailover dies, it sounds like it will prevent any other copy from running without manual intervention: "At startup, the console will attempt to register itself with the master. If another console is already registered, and the failover mode is auto or elect, the console will be blocked from running failover. When a console quits, it unregisters itself from the master. If this process is broken, the user may override the registration check by using the --force option."

Overall, this sounds like a great improvement over what existed before (eg, nothing), but still a pretty brittle, manual, and (in the case of split-brain) very dangerous approach. Have you had good experience with it in practice?

brusch64 · on Oct 19, 2016

We are using Galera clustering with MariaDB on 3 nodes for that.

We are using it since 2014. First in line is a HAProxy which directs the traffic to one of the 3 nodes. If one of the nodes goes down HAProxy directs the traffic to the other two nodes.

This works quite well, as long as not all 3 nodes are down / lose network connection. After that you have to initialise the cluster again. One node is the new master and all later changes in the other databases are overwritten. Then we connect each node to the new cluster.

The biggest problems are write intense operations. We had more luck using two nodes as read nodes and using the third node as write node. When the data in two nodes got updated simultaneously the performance suffered heavily. After some smarter scheduling it works pretty well (as long as not all the nodes go down or the network connection between the nodes is lost). The slowest node can block the write operations for all the other nodes.