In safety-critical systems, failures are usually measured in (severity) x (proba...

In safety-critical systems, failures are usually measured in (severity) x (probability) (and sometimes including a 'detectability' measure).

So a resulting 'acceptable' metric could factor in those less severe cases even if they occur at a higher probability. Scores outside this range would then trigger a redesign to bring it within acceptable boundaries.

I think the difficulty will be in 1) getting a consensus on what the resultant score should be and 2) getting enough information to estimate it in a statistically significant sense.