So if I'm reading this right, the master process can lose writes (if there's per...

cryptica · on Nov 9, 2019

Actually, I may be mistaken about my previous commment. I'm not completely sure if this loss of recent data would happen as I've described. It depends on client implementation. For example, a client could wait for a write to propagate to at least 1 replica before telling the caller that the data was inserted successfully. This is an implementation detail I'm not sure about.

Also the settled flag exists on each record, not each shard. A shard is typically made up of multiple unsettled records. Each worker is assigned to a shard using a hash function so it's deterministic and the worker only processes unsettled transactions from their own shard.

Also I said something else misleading in one of my previous comments. In my case, the shard key of each record (which determines which shard a record belongs to) was not based on its own record ID but on the account ID of the user who owns that record. So effectively the sharding was happening based on user accounts and it was designed so that the records created by an account could be processed independently of records created by a different account.