Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If the promotion logic is wrong how would persistence have helped? Say the primary disk fails. If it's still considered primary when it's brought back up with a fresh disk, wouldn't you get the same empty-replication problem? (I know nothing about redis, just wondering.)


In the case described, there is no promotion logic.

The replicas will try to reconnect to their original master forever unless something else (like Sentinel) redirects them in an actual failover/promotion setup.

So, the master had data, it died, it restart with no data, then the replicas immediately reconnect. If the master had persistence enabled, it would have reloaded the old dataset on startup and the replicas would have re-downloaded everything—since they are replicas of the master, they will always prefer the master data over their own, even if the master is empty.

If you were in a strange case where the disk failed and you replaced it with an empty disk (is that what you mean by "fresh disk?") then it's the same as starting an empty dataset. Not entirely relevant since the server would be intentionally started empty after a maintenance action instead of just restarting the already-populated process that restarts as empty because there's no saved dataset to load on startup.

The "all replicas resync an empty dataset" is a logical consequence of the configuration they enabled, but one without obvious repercussions without either directly experiencing it or a longer multi-chain thought experiment. (but, fixes for such things are already on the way—soon!)


Just to add some more info:

Funny enough what triggers this problem when you have master persistence turned down is, the lack of failover, if the reboot happens fast enough, in case you are using Sentinel, for it to failover to a replica. So no failure was sensed at all, just the master magically wiped its data set.

So from the point of view of distributed systems, if you want to analyze the sum of Redis replicated nodes + Sentinel, the problem is that the system is not designed to cope with nodes losing state on restarts.

However it is possible to improve it, and I'm doing it, but before diskless replication it was IMHO pretty useless to have support for persistence-less operations in conjunction with replication, since the slaves to synchronize required anyway the master to save an RDB file on disk.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: