I think '100% uptime' is pretty much shorthand for "absolutely no reason to turn the machine off or stop services on them running, ever".
As the above link mentions, everything is redundant. CPUs running in (near) lockstep - voting bad cpus out, redundant storage, power etc etc etc.
I've not seen a better write up on IBM's equivalent systems - but I've not had a reason to play with such big iron. NSKs however were much more affordable, and were used in the late 90s/early 2000s in various telcos etc.
I mean, a meteor could hit the DC. But I guess that would count as a decommissioning event, wouldn’t it? So you’re still right, with a proviso—there’s never any reason to turn the machine off with the expectation of ever turning it back on.
A company I worked for (in São Paulo) had a couple failover setups for its Unisys A-series machine, one with a bank across the street and another with a near twin machine running in its factory in Manaus.
If, for some reason, an event decommissioned all three machines, having our computers back online would be a lesser problem - we would probably be better off learning to hunt and make fire.
https://en.wikipedia.org/wiki/Tandem_Computers#Tandem_NonSto...