Continuing on with the series of blog posts that take a deeper look at how OpsGenie can be used to alleviate alert fatigue. Mute, acknowledge all and close all actions were specifically designed for situations where excessive alerting can hinder operations.
When there is a significant problem in our systems, monitoring systems typically generate many alerts. If you have ever found yourself in a position where while you’re trying to figure out what the problem is, your phone keeps buzzing with alerts that you have to acknowledge to keep from escalating, you know what a pain this can be. You already know that there is a problem, and working on it. The last thing you want is to keep acknowledging alerts. So what to do?
OpsGenie provides several “bulk actions”: mute, acknowledge all, and close all. Unlike regular actions that are executed agains a single alert, these actions apply to multiple alerts. These actions are available in mobile apps in the alert list, by swiping from left to right.
Mute action stops all notifications to the user for the next 5 minutes, and can be rather useful in dealing with alert storms. If you’re getting a barrage alerts, executing mute actions stops all the notifications that interrupt what you’re trying to do, giving you an opportunity to do what you need to: figure out what’s wrong, (hopefully) resolve the problem or escalate the problem to the right person. Acknowledge all action can also be quite useful as it would stop escalation policies from triggering.
Alerts are meant to get your attention when there is an important event. Alerting systems should get out of the way and let the users do their work once users signal working on the problem. OpsGenie provides the mute and other bulk actions for this purpose.