Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

i think they are higher than you expect, because usually what causes the bug to be known is a worsening state of the system that makes the bug more likely to be hit.

i would ask how the engineer found the race condition, and whether that doesn’t imply a much greater risk.



This, as the state continues to worsen, the higher the chance that someone observing will go "huh that looks off" and then look into it, all while your system hasn't toppled over yet, no notice or write up would be necessary, but you definitely know now what the problem is. And then following that while you are working on a patch the system finally topples over and causes an incident/outage.


There likely was monitoring for various "problems" in production - error rates, validation failures etc, or even just good old crash counts.

An alert may have fired that lead to someone debugging the issue in detail.

I can totally imagine a slow creeping Metric Of Death that has slowly slowly slowly been creeping up for ages and then suddenly breaches some threshold and then becomes a problem.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: