Because the things we monitor are heavily interrelated, and an outage or degradation in one service will often affect others, it's not uncommon for us to see multiple alerts at once.
The typical scenario goes like this:
1. Pile of alerts goes out.
2. I acknowledge all of them in the course of acknowledging the first.
3. Some of the alerts will decide, despite having state-changed to yellow in the app, that they somehow haven't been acknowledged, so the app will issue the notification tone — for, again, previously acknowledged alerts.
4. Some more alerts will decide they still haven't been acknowledged, and will play the notification tone again, to remind me that I have outstanding, "unacknowledged" alerts.
All of this while I'm trying to fix the broken thing and am being interrupted to attend to the feels my alerting tool has about having alerts.
That is to say: PagerDuty is getting in the way of my fixing the outage. No, I can't just ignore them. We have a global team, and whoever's secondary is probably asleep, because we've taken care in crafting our on-call rotations such that people shouldn't get alerted at 3am. If they sleep through the alerts, they will escalate up my management chain. Then I'm having to explain to my director or VP why there are "unacknowledged" alerts — while I should be fixing the fucking fire.
It's an abhorrently crap user-experience, in a way that is antithetical to the tool's very purpose, and I'm not the only person on my team who has either wanted to throw, or actually has thrown, their phone through the nearest wall because of it.
Because the things we monitor are heavily interrelated, and an outage or degradation in one service will often affect others, it's not uncommon for us to see multiple alerts at once.
The typical scenario goes like this:
1. Pile of alerts goes out.
2. I acknowledge all of them in the course of acknowledging the first.
3. Some of the alerts will decide, despite having state-changed to yellow in the app, that they somehow haven't been acknowledged, so the app will issue the notification tone — for, again, previously acknowledged alerts.
4. Some more alerts will decide they still haven't been acknowledged, and will play the notification tone again, to remind me that I have outstanding, "unacknowledged" alerts.
All of this while I'm trying to fix the broken thing and am being interrupted to attend to the feels my alerting tool has about having alerts.
That is to say: PagerDuty is getting in the way of my fixing the outage. No, I can't just ignore them. We have a global team, and whoever's secondary is probably asleep, because we've taken care in crafting our on-call rotations such that people shouldn't get alerted at 3am. If they sleep through the alerts, they will escalate up my management chain. Then I'm having to explain to my director or VP why there are "unacknowledged" alerts — while I should be fixing the fucking fire.
It's an abhorrently crap user-experience, in a way that is antithetical to the tool's very purpose, and I'm not the only person on my team who has either wanted to throw, or actually has thrown, their phone through the nearest wall because of it.