one more difference I would add is how a leader is elected in Raft (randomized) ...

jorangreef · 2025-08-17T07:42:09 1755416529

Yes, exactly. That difference came in with '12 VSR's view change (deterministic, round-robin), which upgraded '88 VSR's view change.

'14 Raft missed this upgrade, unfortunately (despite citing the '12 paper), thus preserving Oki's '88 "pick the candidate with longest log".

I find the history of consensus fascinating, and was fortunate to interview both Brian Oki and James Cowling (so many anecdotes): https://www.youtube.com/watch?v=ps106zjmjhw

ingloriousB · 2025-08-17T12:36:33 1755434193

The Raft authors didn't "miss" it, they just decided that randomized was simpler--and simplicity was a deliberate design goal.

They found randomized was easier to reason about compared to trying to figure out how to explain and prove correctness when the next "determined" leader has died or been partitioned.

Lacking formal methods, the original VSR work seems less rigorous/proven. Maybe rigorous proof was just not possible at the time; after all Lamport had to work really hard to prove Paxos correct.

Consider that page 31 of Brian Oki's dissertation, http://www.pmg.csail.mit.edu/papers/MIT-LCS-TR-423.pdf , he complete misses problem that is discussed in Figure 3.7 of the Raft dissertation.

That is, the "new view" may be missing information from a leader who had a higher term/view number, but is temporarily offline. Does the orignal VSR provide the same fix that Raft does for that scenario? Does it address other possible scenarios? Mechanical proofs seem conspicuously absent, and even human proofs aren't in ready evidence.

jorangreef · 2025-08-17T14:32:20 1755441140

> “seems” / “seem”

Are you intending to suggest there are no formal proofs since?

For example, you compare ‘14 Raft against ‘88 VSR, ignoring the interim progress made by MIT with ‘12 VSR?

It would seem fair to compare the latest version of both, no?

ingloriousB · 2025-08-17T15:15:03 1755443703

:) I'm only using "seem" to indicate the limits of my knowledge.

I was just hoping someone would chime in with a link to stronger/formal proof for VSR. Are you aware of any?

So, yes, to my limited knowledge, I've not found any existing formal proofs for VSR.

The 2012 VSR revisited clearly labels their arguments informal; in section 8: "In this section we provide an informal discussion of the correctness of the protocol."

I'd be delighted to learn of any formal / machine checked proofs of VSR ; equivalent to the Verdi project for Raft.

These are elaborate and subtle protocols, easy to get wrong.

In particular when it comes to things like reconfiguration, even Raft had the famous 2016 bug in the simpler of its two reconfiguration protocols.

Note that Verdi did not attempt to verify the reconfiguration protocol; apparently it was too difficult.

There was an attempt earlier this year to give a proven reconfiguration protocol for Raft called Recraft[1], based on an earlier paper called Adore[2]. They discuss why reconfiguration is so difficult to prove. It has to do with circularity.

"ReCraft: Self-Contained Split, Merge, and Membership Change of Raft Protocol" by Kezhi Xiong, Soonwon Moon, Joshua Kang, Bryant Curto, Jieung Kim, Ji-Yong Shin. last revised 28 Apr 2025, v2.

[1] https://arxiv.org/abs/2504.14802

[2] https://dl.acm.org/doi/pdf/10.1145/3519939.3523444

I'm not completely convinced yet that ReCraft works; at one point I thought they assumed away certain scenarios -- but I need to revisit it with a close reading.

At a minimum, reading the Adore paper's discussion of how much subtlety is involved is pretty compelling.

My conclusion is that formal proof is an absolute necessity to have a fighting chance at a correct implementation--especially when it comes to reconfiguration.

jorangreef · 2025-08-17T15:24:06 1755444246

There are at least two formal proofs.

Have you tried Googling for them, instead of creating a throwaway account to comment anonymously here? :)

ingloriousB · 2025-08-17T15:56:58 1755446218

Per the site guidelines[1], please avoid gratuitous negativity.

[1] https://www.ycombinator.com/blog/new-hacker-news-guideline

> https://www.ycombinator.com/blog/new-hacker-news-guideline

Certainly I've googled. I have found no proofs.

Surely by now you would actually exhibit the formal proof if you had one right? (He asks, for the 3rd time).

Note that a TLA+ spec is not a formal proof. Also note that a model checking run is not a formal proof.

I'm still hoping to learn something new about how proven is, say, the reconfiguration or recovery protocols in VSR.

So far I can only conclude that my research has turned up nothing as far a formal proof for VSR. Please show me I'm wrong with a link to one. :)

jorangreef · 2025-08-17T18:12:37 1755454357

I don't mean to trap you, my anonymous friend :) but does the formal proof for Raft in Coq via Verdi not apply—at least in spirit—to the essential view change and SMR protocol for Viewstamped Replication? And, similarly, would you say that Viewstamped Replication's core view change and SMR protocol is really that different from Multi-Paxos as a superset—such that that proof also wouldn’t carry over?

I agree that proofs are sensitive to modeling choices. But the reason I ask is that the literature generally treats the core of these protocols (view change + SMR—reconfiguration aside for now) as essentially equivalent.

For example, I'm sure you're aware of Heidi Howard's work here, which unifies consensus under one framework, the main differences being election style (i.e. Raft does random, VSR does round-robin) and terminology, not fundamental mechanics. The upside being that optimizations and sub-protocols, such as reconfiguration, can then be shared across protocols.

To your point about reconfiguration, reconfiguration sub-protocols are a field in themselves, and here it’s common to mix and match. To be clear, I'm not aware of a proof for the reconfiguration sub-protocol in '12 VRR (and I've found a bug in its client crash recovery sub-protocol—with Dan Ports finding another in its recovery sub-protocol), but again, as Howard notes, since the SMR cores are equivalent, you can adopt a reconfiguration sub-protocol or session sub-protocol that has been proven—at least this is common practice in production systems.

I hope the spirit of the argument is clear. And trust that none of this changes the OP point: that VSR pioneered the field and that Raft (in the authors' own words) is "most notably" similar to Viewstamped Replication.

(Let's not get into the subject of actual implementation correctness, which is orders of magnitude harder than formal design proofs, or the fact that the formal proofs in question still lack for a storage fault model—for example, many Raft implementations violate the findings of “Protocol-Aware Recovery for Consensus-Based Storage”)