Hacker News new | past | comments | ask | show | jobs | submit login

So if I'm reading this right, the master process can lose writes (if there's permanent failure). So taking your 2PC scheme above with the 'settled' flag, you can respond to the client that the txn has been committed once you've written the flag, but this commit marker can also just be entirely lost? (again, permanent failure)

If this 'settled' flag exists on each 'shard' instead of just the coordinator, any random subset of those too can be lost? I don't understand what's going on here, or what guarantees this 2PC implementation provides.




Actually, I may be mistaken about my previous commment. I'm not completely sure if this loss of recent data would happen as I've described. It depends on client implementation. For example, a client could wait for a write to propagate to at least 1 replica before telling the caller that the data was inserted successfully. This is an implementation detail I'm not sure about.

Also the settled flag exists on each record, not each shard. A shard is typically made up of multiple unsettled records. Each worker is assigned to a shard using a hash function so it's deterministic and the worker only processes unsettled transactions from their own shard.

Also I said something else misleading in one of my previous comments. In my case, the shard key of each record (which determines which shard a record belongs to) was not based on its own record ID but on the account ID of the user who owns that record. So effectively the sharding was happening based on user accounts and it was designed so that the records created by an account could be processed independently of records created by a different account.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: