i think they are higher than you expect, because usually what causes the bug to ...

skunkworker · on Nov 23, 2021

This, as the state continues to worsen, the higher the chance that someone observing will go "huh that looks off" and then look into it, all while your system hasn't toppled over yet, no notice or write up would be necessary, but you definitely know now what the problem is. And then following that while you are working on a patch the system finally topples over and causes an incident/outage.

mattlondon · on Nov 23, 2021

There likely was monitoring for various "problems" in production - error rates, validation failures etc, or even just good old crash counts.

An alert may have fired that lead to someone debugging the issue in detail.

I can totally imagine a slow creeping Metric Of Death that has slowly slowly slowly been creeping up for ages and then suddenly breaches some threshold and then becomes a problem.