This is fascinating. I feel like the most straightforward (but hardly efficient) solution is to provide a way for kernels to ask CPUs to "mirror" pairs of cores, and have the CPUs internally check that the behaviors are identical? Seems like a good way to avoid large scale data corruption until we develop better techniques...
Yeah I didn't know! And I just realized this is mentioned in the paper just a little further below where I paused. It seems like it would significantly affect anything shared (like L3 cache)... would Intel and AMD have appetite for adding this kind of thing to x86?
The pair in lockstep is "close", in that it only includes the core and deterministic private resources like core private caches. Shared resources like a L3 cache are outside of the whole pair, and can be seen as accessed by the pair. All output is from the pair and checked for consistency (same for both cores in lockstep) before going out.
Not directly related but some platforms supporting lockstep are flexible: you can use a pair as either 2 cores (perf) or a single logical one (lockstep).