Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> They have excellent intuition around making things redundant to single pieces of hardware failing but don’t really grok making stuff resilient to wider failures.

I always feel like making single components redundant is a fairly well-defined process -- generally speaking, the mechanisms are the same (1+ redundant components, failover, STONITH, etc), where making things resilient on a higher level is not as well-defined, and often requires bespoke solutions to each unique situation.



Hmm?

BFT state machine replication is well-understood and well-defined: use N of M agreement for inputs and run them through a deterministic state machine. Optionally, do N of M signature of outputs.

OTOH what are properties of failover? "Failover" seems like an attempt to cheat on Byzantine generals' problem: Generals send mail and the confirm results in a Zoom call. But what if Zoom doesn't work? What are the assumptions for 1+ redundant components/failover/STONITH?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: