Quantcast
Channel: OpsGenie Blog
Viewing all articles
Browse latest Browse all 204

You woke me up. Now what?

$
0
0

As an alert notification solution, our first priority is to ensure that the right person is notified when there is a problem. OpsGenie sends multiple notifications through different channels, escalates etc. to ensure that critical alerts don’t get missed. As crucial as that is, if an alert notification system just stops at “waking you up”, it becomes part of the problem rather than providing a solution.

Truth is for every alert that is critical and urgent, we receive many others that are not. Sure, it would be great if monitoring systems would only send alerts for absolutely urgent/critical problems, but that is easier said than done. We believe that alert management solutions can/should play a part to - at least - alleviate the pain. The first step on this path is to think about what happens after the alert is received, attention is grabbed. You’re awake, now what?

Providing the tools to enable good alert design that optimizes the experience of alert recipients has been a core design factor for OpsGenie and driven implementation of many features. OpsGenie alerts have flexible schema, support custom fields, tags, etc. as well as adding notes/comments to the alerts. Files (log files, images, docs, etc.) can be attached to OpsGenie alerts, and alerts support custom actions to enable alert recipients to take action directly from their mobile devices. In short, OpsGenie provides the tools to design alerts that truly inform the recipients, and empower them to make quick assessments and determine the right course of action

Let’s go through this with an example. When troubleshooting network related problems, engineers often execute commands like ping, traceroute, etc.  Wouldn’t it be great if alert recipients could run these commands easily from their mobile devices to assess what’s going on? Being interrupted by an alert is a lot less disruptive, if you can perform initial investigative actions in a few minutes, without having to find a computer, connect to internal networks, etc. 

image

Let’s look at how you can do this with OpsGenie. First, we need the action. When creating the alert, we can define “custom actions”. In this case, we’ll define a “Ping” action. This enables alert recipients to execute this action from OpsGenie apps, when they get notified.

Next step is to pass this information on to your systems. OpsGenie supports callbacks for alert activity. When an alert is created, acknowledged, commented, etc. OpsGenie can call an HTTP end point (webhook) and pass the information that a user executed an action on an alert. OpsGenie also provides the Marid utility that subscribes to OpsGenie alerts and can execute a groovy script for alert actions. So combining custom alert actions, with Marid, we can enable user to execute Ping action from mobile devices.

  1. User executes Ping action from OpsGenie app
  2. Marid gets user and alert data, and executes the groovy script associated with the ping action (by default ping.groovy) 
  3. Groovy script executes ping command, captures the output, and adds to the alert as a comment

As stated above, this example script would add the output of the ping command to the alert as a note. We could have instead written the output to a file, and attach the file to the alert as well. Updating the alert with the output of the command provides the information not only to the alert recipient who has executed the command, but also to anyone who may work on this problem later on. 

image

Attention is a scarce commodity. We cannot just send alerts to people and wash our hands off. If we want them to not miss alerts and be productive, we need to go a step further than empower them. We can start with focusing on what they need to do next when they receive alerts.


Viewing all articles
Browse latest Browse all 204

Trending Articles