Quantcast
Channel: OpsGenie Blog
Viewing all articles
Browse latest Browse all 204

DevOpsDC, alerts and notices

$
0
0
<p>At the last <a href="http://www.meetup.com/DevOpsDC/events/134372152/">DevOpsDC meetup</a>, the speaker was Robert Treat (<a href="https://twitter.com/robtreat2/">@robtreat2</a>) COO of OmniTI, and the subject was &#8220;Less alarming alerts&#8221;. OmniTI is an interesting company as they both implement large scale solutions and operate <a href="http://www.circonus.com/">Circonus</a>, monitoring as a service solution, hence presentation was bound to be interesting and did not disappoint. </p><p>Robert&#8217;s presentation made a strong case against generating too many alerts that cause &#8220;pager fatigue&#8221;, and recommended that the alerts should focus on the business and not technical metrics that may not be meaningful or actionable. <span>One of the interesting points Robert made was the distinction between the &#8220;alerts&#8221; and &#8220;notices&#8221;, recommending that alerts should be generated for real, verified problems, and notices should be send via email, etc. for information that may indicate a potential problem, aka leading indicators. </span></p><p>This distinction resonated with as well. At OpsGenie, we use a different terminology but we subscribe to the same philosophy. In OpsGenie lingo, &#8220;alerts&#8221; are what&#8217;s generated by monitoring tools, applications, etc. and &#8220;notifications&#8221; are what&#8217;s sent to the users via email, SMS, phone, mobile push, chat, etc. What Robert referred to as &#8220;notices&#8221; are also alerts in OpsGenie, but they do not necessarily wake anyone up.</p><p><img alt="image" src="http://media.tumblr.com/974d495ddd96b305571a801731c20ea5/tumblr_inline_mus4mhUfvo1soq1dj.png"/></p><p>OpsGenie allows users to control how they are notified for different alerts for different times, so users can define rules to get SMS and phone notifications for urgent/critical problems, but only receive email or nothing at all for non-critical alerts (notices). In our internal systems, we tag non critical problems with &#8220;low_priority&#8221; tag, and each of us has different rules that govern how to handle low  priority alerts. I personally have rules to receive low priority alerts via email with 1 hour delay and only during the day. If the alert is closed automatically or by a member of the team within the hour, I don&#8217;t even get an email. If I do get any emails, I always have the option to use OpsGenie mobile apps or web UI to review any alerts that have not been yet processed by anyone. </p><p><img alt="image" src="http://media.tumblr.com/e73e897331a9acdcebb3dea27c085403/tumblr_inline_mus4tgzMvA1soq1dj.png"/></p><p>In short, not all alerts are created equal, nor they need to be treated as such. We can drive our colleagues crazy if we notify them for every leading indicator, but it&#8217;s also very hard to determine which alert is critical and urgent on behalf of the users. This is why we chose to implement a solution that enables the admins to tag the alerts and let the users control how and when themselves. </p><p><a href="https://twitter.com/berkay">@berkay</a></p><p></p><p></p><p></p>

Viewing all articles
Browse latest Browse all 204

Trending Articles