Hacker News new | past | comments | ask | show | jobs | submit login

So even if one is "P"-protected in the "CAP" theorem sense, you might not be able to handle a full split brain partition.

I don't think so. Split-brain (assuming you're talking about totally isolated network components, and not "disagreement about authoritative state between nodes") is a special case of a partition: any partition-tolerant system is tolerant of totally isolating partitions by definition. It might do that either by entering split-brain and becoming non-linearizable, or by refusing requests.




"by refusing requests"

So any system that can handle a split brain by crash failing successfully handles the situation?

The definitions here are so broad, and doesnt really match my experience, which is namely detecting a fully split brain is difficult for humans to do, harder for systems, and generally causes major problems.

Over-provisioning your system is one solution (eg dynamo with N=5), but that isnt acceptable to everyone...


So any system that can handle a split brain by crash failing successfully handles the situation?

We seem to disagree about what "availability" means in the context of CAP. Nothing has to crash. All that a serializable system has to do under partition is refuse to respond to at least one possible request. Exactly which requests is algorithm-dependent.

For example, a system with e3PC quorum reads and writes will handle partitions by refusing to allow reads or writes in any minority component. If it writes to all nodes and can read from any node, it will refuse to accept any writes during partition, but all nodes can service read requests consistently. Conversely, if it writes to any node and reads from all, it can always handle write requests, but reads will fail.

Pragmatically, the desire for single-node-failure tolerance leads most CP systems to accept operations only when a majority of nodes are accessible. Systems like Paxos, Viewstamped Replication, ZAB, and Raft provide CP semantics in exactly this way: a partition which divides a 5-node cluster into a 3-node component and a 2-node component will continue to work correctly in the 3-node component, and the 2-node component will refuse requests until the partition is resolved.

Not all systems will shut down a component completely. Most sharded CP databases will run, say, Paxos quorums between groups of three or five replicas, but overlap those replicas around the hash ring. This is the design described by Eric Brewer for Stasis, and is the planned design for Cassandra's CaS and Riak's CP write support. These systems offer serializability over any particular key (or shard, e.g. a single quorum), but not between keys. During failure, some keys will be consistently accessible in one component, and a disjoint set of keys will be accessible in another.

The definitions here are so broad, and doesnt really match my experience, which is namely detecting a fully split brain is difficult for humans to do, harder for systems, and generally causes major problems.

You are correct--designing CP algorithms is difficult. Luckily, you don't have to, because we have several already. You might start with

https://bitbucket.org/cometpeak/ringmaster/src/1e9fc082d1e5/...

http://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/past/03F...

http://pmg.csail.mit.edu/papers/vr-revisited.pdf

https://ramcloud.stanford.edu/wiki/download/attachments/1137...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: