<p>At the last <a href="http://www.meetup.com/DevOpsDC/events/134372152/">DevOpsDC meetup</a>, the speaker was Robert Treat (<a href="https://twitter.com/robtreat2/">@robtreat2</a>) COO of OmniTI, and the subject was “Less alarming alerts”. OmniTI is an interesting company as they both implement large scale solutions and operate <a href="http://www.circonus.com/">Circonus</a>, monitoring as a service solution, hence presentation was bound to be interesting and did not disappoint. </p><p>Robert’s presentation made a strong case against generating too many alerts that cause “pager fatigue”, and recommended that the alerts should focus on the business and not technical metrics that may not be meaningful or actionable. <span>One of the interesting points Robert made was the distinction between the “alerts” and “notices”, recommending that alerts should be generated for real, verified problems, and notices should be send via email, etc. for information that may indicate a potential problem, aka leading indicators. </span></p><p>This distinction resonated with as well. At OpsGenie, we use a different terminology but we subscribe to the same philosophy. In OpsGenie lingo, “alerts” are what’s generated by monitoring tools, applications, etc. and “notifications” are what’s sent to the users via email, SMS, phone, mobile push, chat, etc. What Robert referred to as “notices” are also alerts in OpsGenie, but they do not necessarily wake anyone up.</p><p><img alt="image" src="http://media.tumblr.com/974d495ddd96b305571a801731c20ea5/tumblr_inline_mus4mhUfvo1soq1dj.png"/></p><p>OpsGenie allows users to control how they are notified for different alerts for different times, so users can define rules to get SMS and phone notifications for urgent/critical problems, but only receive email or nothing at all for non-critical alerts (notices). In our internal systems, we tag non critical problems with “low_priority” tag, and each of us has different rules that govern how to handle low priority alerts. I personally have rules to receive low priority alerts via email with 1 hour delay and only during the day. If the alert is closed automatically or by a member of the team within the hour, I don’t even get an email. If I do get any emails, I always have the option to use OpsGenie mobile apps or web UI to review any alerts that have not been yet processed by anyone. </p><p><img alt="image" src="http://media.tumblr.com/e73e897331a9acdcebb3dea27c085403/tumblr_inline_mus4tgzMvA1soq1dj.png"/></p><p>In short, not all alerts are created equal, nor they need to be treated as such. We can drive our colleagues crazy if we notify them for every leading indicator, but it’s also very hard to determine which alert is critical and urgent on behalf of the users. This is why we chose to implement a solution that enables the admins to tag the alerts and let the users control how and when themselves. </p><p><a href="https://twitter.com/berkay">@berkay</a></p><p></p><p></p><p></p>
↧