Hacker News new | past | comments | ask | show | jobs | submit login

This is fascinating. I feel like the most straightforward (but hardly efficient) solution is to provide a way for kernels to ask CPUs to "mirror" pairs of cores, and have the CPUs internally check that the behaviors are identical? Seems like a good way to avoid large scale data corruption until we develop better techniques...



Tandem used to do this. By descent the technology wound up with HPE.

Their Tech Reports are worth a sample and fortunately they're online: https://www.hpl.hp.com/hplabs/index/Tandem

Probably the best one to start at: https://www.hpl.hp.com/techreports/tandem/TR-90.5.pdf


Thanks for the reference.


That’s called dual core lockstep and it’s very common in automotive and other applications where reliability is paramount.


Yeah I didn't know! And I just realized this is mentioned in the paper just a little further below where I paused. It seems like it would significantly affect anything shared (like L3 cache)... would Intel and AMD have appetite for adding this kind of thing to x86?


The pair in lockstep is "close", in that it only includes the core and deterministic private resources like core private caches. Shared resources like a L3 cache are outside of the whole pair, and can be seen as accessed by the pair. All output is from the pair and checked for consistency (same for both cores in lockstep) before going out.

Not directly related but some platforms supporting lockstep are flexible: you can use a pair as either 2 cores (perf) or a single logical one (lockstep).


Mainframes do this. They'll also disable the failing CPUs and place a service call to IBM to get someone to swap out the part.


Wow that's cool. It'd be quite interesting if the conclusion ends up being that we should go back to mainframes...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: