As an alert notification solution, our first priority is to ensure that the right person is notified when there is a problem. OpsGenie sends multiple notifications through different channels, escalates etc. to ensure that critical alerts don’t get missed. As crucial as that is, if an alert notification system just stops at “waking you up”, it becomes part of the problem rather than a solution.
Truth is for every alert that is critical and urgent, we receive many others that are not. Sure it would be great if monitoring systems would only send alerts for only urgent/critical issues, but that’s easier said than done. We believe that alert management solutions can and should play a part in alleviating the pain. The first step in this process is to think about what happens after the alert is received. You’re awake, now what?
Providing the tools to enable a good alert design that optimizes the experience of alert recipients has been a core design factor for OpsGenie and has driven the implementation of many of the features. OpsGenie alerts have a flexible schema that support custom fields, tags, etc. as well as allow the user to add notes/comments to the alerts. Files (log files, images, docs, etc.) can also be attached to OpsGenie alerts. These alerts support custom actions that enable alert recipients to take action directly from their mobile devices. In short, OpsGenie provides the tools to design alerts that truly inform the recipients, and empower them to make quick assessments and determine the right course of action.
A good example of this is when troubleshooting network related problems, engineers often execute commands like ping, traceroute, and other similar utilities. Wouldn’t it be great if the alert recipients could run these commands easily from their mobile devices to assess what’s going on? Then and only then, being interrupted by an alert is a lot less disruptive when you can perform initial investigative actions in a few minutes, without having to find a computer or connect to internal networks, etc.
Let’s look at how you can do this with OpsGenie. First, we need the action. When creating the alert we can define “custom actions” such as a “Ping” action, enabling alert recipients to execute this action from OpsGenie apps when they get notified.
The next step is to pass this information to your systems. OpsGenie supports callbacks for alert activity, when an alert is created, acknowledged, commented, etc. OpsGenie can call an HTTP endpoint (webhook) and pass the information that a user executed an action on an alert. OpsGenie also provides the Marid utility that subscribes to OpsGenie alerts and can execute a groovy script for alert actions. So combining custom alert actions with Marid, we can enable the user to execute a Ping action from a mobile device.
- User executes Ping action from OpsGenie app
- Marid gets the user and the alert data, then executes the groovy script associated with the ping action (by default ping.groovy)
- Groovy script executes ping command, captures the output and adds to the alert as a comment
As stated above, this example script would add the output of the ping command to the alert as a note. Instead, we could have written the output to a file, then attach the file to the alert. Updating the alert with the output of the command provides information not only to the alert recipient who executed the command, but also to anyone who may later work on this problem.
Attention is a scarce commodity. We cannot just send alerts to people and then wash our hands of the issue. If we want them to be productive and not miss a critical alert , we need to take it a step further: Empower them! We can help them by focusing on what they need to do next, when they receive that “wake up in the middle of the night” alert.