Quantcast
Channel: OpsGenie Blog
Viewing all 204 articles
Browse latest View live

Routing phone calls using on-call schedules - OpsGenie

$
0
0
Feb 8, 2016 by Berkay Mollamustafaoglu (Original Post: Apr 7, 2014)

Since we launched the OpsGenie phone call routing feature last year, we’ve had an enormously great response from customers. So much, in fact, that we’re dusting off this blog post from last year and updating it for everyone who is not as familiar with it. Is it easy to use? Yes, it is! You see, OpsGenie routes alerts to the appropriate on-call individual using a method of policies, on-call schedules, etc.. Prior to the launch of the application last year, we heard similar questions from a number of our OpsGenie customers, such as “Can we route phone calls to the right person like we route the alerts?” This turned out to be a great question, one that resonated with many of our customers. For a product team, customer feedback like this is priceless!

They’ve already had their on-call schedules defined in OpsGenie and are receiving phone calls from their users for support as well as other purposes. It just makes sense to use the on-call schedules to route the call to the right person. This request resonated with us at OpsGenie. In addition to our support site, email, and live chat; because, we needed a similar solution internally, we thought it would be important to provide phone support for all OpsGenie customers. However, after a careful review process, the potential solutions readily available in the market did not meet the requirements of our customers, and they fell short of our needs as well. So we developed it ourselves. :) We’re happy to announce it has been a huge success for our customers! It’s by far the most used feature on our platform. The ease of use and implementation is what separates OpsGenie from our competitors.

Over the last year, our customers have been successfully using the “incoming calls” (great name, I know - I just hired a marketing team so that may change) feature, and we’ve been using it ourselves to route a majority of our support calls to the on-call engineer(s). Win- win! We’re also happy to re-announce that based on our own experiences and the valued feedback from our users, we enhanced the call routing solution to meet a wider range of requirements. Here’s how it works:

  1. Provision a phone number(s) (available for many countries) from OpsGenie.
  2. Configure how OpsGenie should route the call (Forward Calls To). For example, an on-call schedule can be used to determine who should receive the call.
  3. When someone calls the phone number assigned to your account, OpsGenie uses your configuration and forwards the call to the right person.
  4. OpsGenie also creates,alerts, and notifies the specified people that there was an incoming phone call.
  5. If no one answers the phone, the caller is instructed to leave a message. The message is recorded and attached to the alert that was created, and can be listened to by the alert recipients.

That’s all it takes; a few clicks and you’re all set with a phone number that aligns with your workflow. However, we noticed that there are a number of benefits of routing the incoming phone calls this way:

  • Calls can be forwarded to anyone in the world, enabling geographically dispersed organizations to provide support seamlessly.
  • OpsGenie can try to reach additional users if the first person does not answer, providing some redundancy.
  • The alert generated by OpsGenie for the call can be handled like any alert in OpsGenie; rules and policies can be used to route the alert to the right people based on the caller’s phone number.
  • Alert activity logs indicate whether the call was answered, how long it lasted, etc.
  • Information about the call can be forwarded to team chat rooms using OpsGenie integrations with Slack, HipChat, Campfire, etc. and to other systems via OpsGenie webhook or Marid integration.

Incoming phone call routing has become an essential part of our tool set. We hope that you’ll continue to find it as useful as we do. :) Please contact us if you have any questions about this critical feature set.

For more information, please refer to OpsGenie Incoming Call Routing document.


Fighting Alert Fatigue - Alert Deduplication -- Part 1

$
0
0

The concept of “Alert Fatigue” is well known in industries such as healthcare, and awareness is increasing in IT operations as well. Fighting alert fatigue has been a key design objective for OpsGenie since our inception. Summarized in the earlier post, some of the key capabilities that OpsGenie provides can be used to alleviate alert fatigue. In a two part series, I go into more detail on how these features canimprove the alert signal to noise ratio.

Alert deduplication is a key feature that OpsGenie provides to reduce noise. Monitoring systems can be quite chatty by sending multiple alerts for the same issue and/or problem. OpsGenie alerts can be deduplicated using the “alias” field. The alias field is a user defined, unique identifier for “open” alerts, meaning there can be only one open alert with the same alias.

The alias field can be set when the alert is getting created. When the alert is created via the UI or API, the value for the alias field can be specified explicitly like any other alert field.

When creating alert fields via email,string processing methods can be used to extract the data from the email’s subject or body.

When OpsGenie receives a “create alert” request with an alias that already exists and is open, OpsGenie will increase the count field value, and will update the alert activity log, which indicates that the alert is deduplicated. If the alert is closed, a new alert with the same alias will be created.

The alias field provides a simple, yet fully flexible mechanism to deduplicate alerts and will ultimately reduce unnecessary alert notifications… and helps fight alert fatigue.

Fighting alert fatigue - mute and bulk actions

$
0
0

Continuing with the discussion on how OpsGenie can helpalleviate alert fatigue we will be examining areas where on-call employees take specific bulk actions to reduce the excessive alerts that often hinder operations.

Monitoring tools typically generate multiple alerts when there is a significant problem with our system. If you have found yourself in a position where you’re trying to understand an issue - and your phone keeps buzzing with alerts that you’re required to acknowledge, then you know what an inconvenience this can be. You recognize that there’s a problem, and you’re trying to work on it. How frustrating! The last thing you want is to keep acknowledging alerts. (Side note: Writing this whole scenario has me alert fatigued already.) So, what do you do?

OpsGenie provides several “bulk actions”: Mute, Acknowledge All, and Close All. Unlike regular actions that are executed against a single alert, these actions apply to multiple alerts. These actions are available in the mobile apps located on the dashboard. [Editor note: We just added a new “Snooze” function as we were writing and editing this blog post. You can now snooze alerts by either the alert list and/or the details pages. With this new function you can roll back the alert state into its initial state which will prevent recipients from being notified until the snoozing ends. You can also Snooze the alert for up to a full week. The new Snooze function is available now on the web and our mobile apps.]

Let’s examine all three bulk actions that OpsGenie offers our customers. The “Mute” action stops all notifications to the user for the next 5 minutes. The Mute action can be rather useful in dealing with alert storms. If you’re getting a barrage of alerts, executing the mute action stops all notifications that keep interrupting you while you try to deal with the situation and/or resolve the issue. This quiet time and peace allows you the opportunity to do what you need to do to understand what’s wrong, and (hopefully) resolve the problem or escalate the issue to the appropriate individual. The “Acknowledge All” action can also be quite useful as it would stop escalation policies from triggering. This action grants the opportunity to notify all involved in the escalation that you’re on the case.

The “Close All” action was developed to easily close out all the alerts once you’ve been able to rectify the issue or problem. Instead of notifying each and every on-call employee within the chain that you’ve resolved it, this action helps you easily communicate and put closure to it. Alerts are meant to get your attention when there is an important event. Once the alert is sent out and users signal that they are working on the problem, the alerting systems should take a step back and let the users do their work. OpsGenie provides our users these bulk actions for this purpose - and to help Fight Alert Fatigue.

What’s great about the NEW OpsGenie Apple Watch App?

$
0
0

As if you needed another reason to love OpsGenie and all its capabilities- We released an OpsGenie app for the hi-tech, sleek Apple Watch; where now, you can get the most out of your weekends and look stylish doing it.

The Apple watch and their motto of “to wear it is to love it,” and “just a raise of the wrist away” inspired us to innovate our app. With “just a raise of a wrist” you can: view and manage alerts faster; ack, close, and mute alerts without opening the app; see the on-call status along with the schedule name; see the number of open and unacked alerts; and more! Of course, this isn’t that different than the OpsGenie mobile app or the website, so what makes the Apple watch the next best thing? What makes an app on a watch so much more convenient than pulling out your phone?

Well, the Apple watch will keep you engaged with your environment, it is more immediate, and more discreet. Smart phones form walls that divide you from your environment- wearable technology eliminates that wall. Opposed to glancing down at your phone, glancing down at your watch is much less involved. Wearable technology can provide you with information immediately by responding to your environment and keeping you updated with whatever you might want to know. No need to waste time going through each and every app. Wearables are more discreet which will also benefit your environment. No more pulling out your phone, or being interrupted at dinner with your friends and family- you can take a quick peek at your watch and respond accordingly.

While wearable technology has gained traction as an activity tracker or a phone call and text notification system, we decided that allowing OpsGenie to send you notifications and communicate through something so simple will inspire you to keep moving and give you the relaxing weekends that you deserve. Check it out now on the App Store.

Which superhero would be best at answering on-call alerts and real-time incidents?

$
0
0


 The OpsGenie team recently had a thorough and heated discussion (KAPOW!!) on who would be better with on-call alerts and incident management, Superman or Batman? Who would come out as winner when pitted against each other in a war of on-call alerts and response time? So, we thought we would hash it out here on our blog in a completely fictional format. We’ll try to examine each area of alerting and incident management to see who we think we would want on our on-call team.

Here’s some backstory on on-call alerts using an alerting and incident management solution: We strive to make the lives of dev&ops teams better through actionable alerts, on-call scheduling, and escalations. With customizable notification processes and a simple application system it will make time more efficient for dev and ops teams.

Now, how does that relate to these two well known heroes? Well, in most cases they need to be alerted to real-time incidents, right? What if there’s a crime and they both show up and only one is needed? Or, what if Batman is responding to all the alerts and Robin isn’t doing anything? With alerting and incident management tools, they usually have reports and logs to help keep track of that, and when Clark is out with Lois Lane, the alert can be escalated to Batman and then Robin, who can pick up the slack when necessary!

Who’s better?

Most alerting and incident management solutions are designed so that on-call dev&ops teams don’t have to be glued to their computers while on-call, our question now is: in a battle between Superman and Batman, who’s best at responding to on-call alerts and real-time incidents?

You cannot deny both heroes are capable in their own rights. Both are strong and intelligent; and both are always available and on-call at all times.

Superman: Superman has superhuman strength, speed, flight, heat vision, and is invulnerable. How does all this equate into answering alerts and incident management when being on-call? Well, Superman can answer calls extremely quick and can even fly to the area affected and resolve the issue in-person. He also never sleeps and is never tired (or so we presume) when answering calls in any timezone. So, he always handles calls and incidents with careful precision. He also can do the job of 10 people. He can monitor, ticket, and answer the ticket in seconds. He can easily hear when an alert goes off giving him the ability to answer at a moment’s notice. Now, Superman works at the Daily Planet a newspaper publication as a reporter, which we think could make him good at being on-call, but Lois seems to be the one to get the most urgent stories and the one they call in times of need. And she doesn’t seem to escalate a lot of those stories to Clark. So he probably will not work well in a devops team. He’s more of an individual which works for him, but our team needs a leader and a non-individual. Although OpsGenie has a free plan, we don’t think it would work for a team of 1. POW!!!

Batman: Batman is at peak human physical and mental condition, master at martial arts and hand to hand combat, and knows how to utilize high- tech equipment. So, how does Batman stand-up to being on-call? Well, Batman works only at night, so he has the ability to answer calls when needed in a moments notice. Because Batman has experience with technology, he can utilize most alerting tools and we believe he would adopt the DevOps ways and look to integrate most monitoring and ticketing solutions for the best response results. Batman also has a great escalation team with Robin, Alfred, and Commissioner Gordon. Batman is used to being alerted and answering the call via Batphone which we presume is now mobile and also alerted by the Bat Signal. Batman is very analytical, so we assume that he’d be great at deciphering logs and reports to help him and his team get better at their mean-time-to-respond/repair (MTTR). BAM!!

Both heroes have so much to offer and we would probably hire both in a second, but we at OpsGenie in our humble opinion believe that Batman would be better at answering on-call alerts and handling incidents when called upon. He seems ready made for these situations and would be a great on-call engineer or system admin. Although, Superman would be a good and cost effective engineer, we think he would have a longer training period and would easily get distracted with outside interferences. Now if both Batman and Superman were to work together collaboratively, we think an alerting and incident management tool would be perfect to help fit them both into a weekly schedule with an escalation policy which would produce nothing but peace and harmony throughout the world.

BOOM! Winner: Batman - Which one would you want to have on your team?#favoriteoncallhero

With on-call alerts, sometimes it can wait..

$
0
0

For most of us in ops, it is vital for us to get notified asap about problems that impact the services we provide. It’s often a race against time to restore the service or to prevent an outage. But not all alerts require an immediate response, some can wait. Enabling users to deal with alerts that don’t require an immediate response efficiently, is just as important in preventing alert fatigue, to ensure we can stay fresh.

At OpsGenie our mission is to empower our users to be able to handle critical as well as non critical/urgent incidents efficiently.

Snooze that alert

Sometimes an alert does not require an immediate response, but it still requires an action. Sure, you can acknowledge the alert and come back to it later, but acknowledging an alert stops the notifications and the escalation process. What if you forget? An alert that was not urgent at the time can become a real problem. Snoozing the alert is a great solution for this situation. Snooze the alert for some time and if you don’t get back to it and resolve the problem, alerting would begin again after the specified time frame.

Delay those notifications

For non-urgent alerts, you can wait for some time before notifying the users. Using a notification policy, alert notifications can be delayed, for a few minutes, hours, or until a certain time. Delaying non-urgent alert notifications can make a significant improvement in the lives of ops people, our primary mission.

  • When alert notifications are delayed, alerts are still visible to the users hence a team member who may be better suited to respond can still look into the alert and can take care of it.
  • Some problems are transient, services may be restarted, closing the alert automatically.
  • Some problems can wait until business hours and does not require waking up an on-call engineer. For these type of nonurgent problems, alert notifications can be delayed till the morning. If the alerts are still open, the appropriate people can be notified to ensure they don’t fall through the cracks.

Set your own rules!

With OpsGenie, notification rules not only allow users to control how they are getting notified for different alerts, they can also be used to specify a time delay for each notification method. For example, (for non critical/urgent alerts) using notification rules, a user can configure OpsGenie to get notified via email immediately, and via push/SMS after 10 minutes; where for critical alerts can configure push/SMS/phone notifications to be much more aggressive.

The simple fact is, not all alerts are created equal. It makes no sense to scream that the sky is falling when it’s not. If the alerting system cannot empower you to handle your non-urgent/critical alerts with low overhead, it cannot handle critical ones all that well either.

Email Integration: Alerting and Incident Management Solutions

$
0
0

In the coming weeks OpsGenie will help buyers looking for a reliable, scalable, and customizable alerting and incident management solution by assessing features, toolsets, and functionality in a comprehensive comparison between OpsGenie, Pagerduty, and VictorOps through a series of detailed blog posts. It is our goal to shed light on who does what, and the stark realities between the three popular technologies. OpsGenie will concentrate on areas within our platform that we believe are extremely important when looking for an alerting and incident management solution for the dev&ops and IT community in general. This week we will focus on Email Integration.

PART 1: Email Integration -- A comparative assessment of the direct contrasts between OpsGenie, Pagerduty, and VictorOps. OpsGenie’s email integration enables customers to integrate OpsGenie with any system that can send alerts via email. Email integration is the most commonly used integration method by our customers since it is easy to use and almost any system out there can send emails.

Although our competitors have it as well, our email integration is representative of what differentiates OpsGenie from our competitors; hence, it’s worth doing a comparative analysis with a couple of different use cases.

Case 1: Creating Alerts for any Received Email

The most basic use case is the ability to create an alert and notify users with as little configuration as possible.

OpsGenie

OpsGenie email integration screen

This use case is extremely simple to configure in OpsGenie. It only requires you to specify the email address and the team(s) that the alert will be routed to. With this configuration any alert sent to the specified email address becomes an alert in OpsGenie, and the appropriate people are notified according to the specified team’s policies.

VictorOps

It is also fairly straightforward to support this use case in VictorOps. An email address is automatically generated.

OpsGenie email integration screen

http://victorops.force.com/knowledgebase/articles/Integration/Generic-Email-Integration/

Users can control its routing by adding a routing key to the email address, although it seems“intelligent routing” is only available on the VictorOps Enterprise plan. https://victorops.com/pricing/

Pagerduty

Pagerduty supports email integration as well. Similar to OpsGenie, user can specify the email address and specify an escalation policy to route the alert.

https://www.pagerduty.com/docs/guides/email-integration-guide/

Case 2: Automatically Closing Alerts via Email

Monitoring tools can often send an email when the problem is resolved (Host is up, etc.). The service should be able to recognize the resolution emails and close the alerts automatically using these emails, instead of creating a separate alert for the resolution email.

OpsGenie

OpsGenie supports executing different actions when an email is received. Based on the content of the email, OpsGenie can create an alert, close it, acknowledge the alert, or add a note to it.

To accomplish executing these actions on existing alerts, OpsGenie provides an alert field (alias) to uniquely identify the “open alerts.” When the resolution email comes in, OpsGenie can identify the existing alert using the alias field, and close the alert.

To configure this behavior, an “advanced” configuration mode is used. This configuration mode exposes all configuration options, enabling customers to control how the email is processed. Email fields (subject, message, etc.) are provided as variables that can be used in filter conditions and set alert field values. Integration also supports using string processing methods to extract strings from email content.

OpsGenie close alert via email

VictorOps

VictorOps supports an automatic resolution of incidents, but only if the email subject contains predefined keywords. It creates an incident if the email subject has CRITICAL or PROBLEM and resolves the incident if the subject contains RESOLVED or OK.

In addition, automatic resolution works if the subjects of emails are exactly the same except for the predefined words. If the wording of the subject of the emails is different, if the subject has a date, etc. then the solution will not work (as indicated in the “important note” section).

http://victorops.force.com/knowledgebase/articles/Integration/Generic-Email-Integration

Pagerduty

Pagerduty also supports automatically resolving supports triggering and resolving incidents from emails using custom rules.

Pagerduty close alert via email

https://support.pagerduty.com/hc/en-us/articles/203232630-Getting-Started-with-Email-Management-How-to-Automatically-Resolve-Incidents-Triggered-via-Email

Case 3: Deduplication

The number of monitoring tools that periodically send emails when there is a problem. Although its intention is good, creating an alert for each email gets messy very fast. The service should be able to reduce the noise and create only one alert for these emails, aka deduplication.

OpsGenie

OpsGenie provides a highly flexible mechanism to deduplicate alerts. The alert alias field is used as a primary identifier for open alerts. Customers can set the value for the alias field in the integration configuration. When processing emails, the alias can be set as the subject, body, from or to fields of the email. In addition, integrations support using string processing methods such as substringBefore, substringBetween as well as regular expressions to extract any part of the email (from subject or body) and use as the alias field to deduplicate alerts.

OpsGenie string processing methods

In addition, integration strips are common prefixes used when an email is replied to, forwarded to, etc. and provides the “Conversation Subject” field to make it easier to deduplicate these emails.

VictorOps

VictorOps does not seem to have a configurable deduplication solution. It only deduplicates if the subject of the emails are exactly the same, except with the predefined keywords as described above. This approach is limited, at best, if you don’t have control over the content of the email (which is often the case).

Pagerduty

Pagerduty also supports deduplication using the incident key field. As it is the case for OpsGenie, Pagerduty allows parsing the email content and extract subset of the data to use as the incident key.

Case 4: Filtering

Customers often need to be able to control which emails should be allowed to create alerts. They may need to filter alerts based on the sender, only allow emails from certain addresses, or email content.

OpsGenie

OpsGenie’s email integration supports filtering out unwanted emails. Using the “ignore” action, customers can define rules to ignore emails based on the sender’s address, name, email subject, or body.

OpsGenie filtering out unwanted emails

VictorOps

VictorOps does not seem to have any means to filter out emails.

Pagerduty

Pagerduty supports discarding unwanted emails.
https://support.pagerduty.com/hc/en-us/articles/202830380-Setting-up-regular-expression-filters-in-your-PagerDuty-service


As you can imagine, we believe the capabilities OpsGenie offers for email integration are essential. You may or may not need them day one, but they are there if you do. The flexibility ensures that the solution can meet your current as well as your future needs.

Did we get anything wrong? Help us correct any mistakes in this analysis and we’ll send you a gift card or donate to your fav charity on your behalf!

Troubleshooting problems from a chat room with Slack and OpsGenie

$
0
0

OpsGnie Slack together

With OpsGenie you have the option of executing actions directly through our OpsGenie app. We have described how this capability can be used to gather additional information and enable alert recipients to assess problems efficiently in the post “You woke me up, now what?”

Directly through slack you can (1) forward alert activity to Slack channels and (2) allow users to interact with alerts, acknowledge, comment, close, etc. Refer to our blog post titled “Bi- Directional Integration with Slack”

When we combine these capabilities, you can execute custom commands.

(For example, one can execute commands like ping, or traceroute.)

OpsGnie Slack Diagram

Here’s how you execute custom alerts:

  1. When you create alerts in OpsGenie, you can specify relevant actions to that alert. (A network related alerts may have actions like ping and traceroute, an application problem may have an action that gather info from a log file, etc.)
  2. OpsGenie forwards all alert activity’s to Slack. So, when there is a new alert or when a note is added to an alert users will be able to see it on their Slack channels.
  3. Thanks to Slack’s support command execution, users can execute the custom action from Slack (/Genie commands) One of the commands OpsGenie integration supports is the “exec” command that allows executing the custom action.
  4. OpsGenie passes the alert and the action executed by the user to the customer systems. Marid utility subscribes to these actions, and can execute the relevant script.
  5. Marid collects the output and adds to the alerts as a note, which gets passed to Slack, hence the user can see the output of the command.

There are numerous advantages to executing actions directly from the chat room- the problems are visible to users in the chat room as well as anyone who may get alert notifications from OpsGenie and use OpsGenie’s app or web UI. If the alert is escalated the following person has full visibility of what has been done. The ability to make relevant actions available for alerts allows guiding users to follow a common procedure.

Take a look at this short screencast to see how it all comes together


IT Operations Management Startup OpsGenie Raises $10 Million in Financing Led by Battery Ventures to Fuel Growth

$
0
0

Falls Church, VA—June 29, 2016—OpsGenie, an emerging player in the critical area of IT alerting and on-call management, has raised $10 million in Series A financing from Battery Ventures, a global investment firm. OpsGenie will use the funds to continue to tackle the biggest challenges faced by customers in providing “always-on” services, and specifically to continue investing in its product and building out its go-to-market capabilities. As part of the financing, Battery General Partner Neeraj Agrawal and Battery Vice President Paul Drews will join OpsGenie’s board.

The company’s products operate against the backdrop of sophisticated, modern datacenters in which a web of “always on” servers, applications and other technology—some on-premise, and some housed in the cloud—continuously throw off high-stakes alerts that must be managed by various IT teams. New software development trends like the move to “microservices” and agile development also means software is being developed faster today, which creates more opportunities for mishaps—and a need to alert the right people to fix software problems. OpsGenie’s technology integrates with other key monitoring and ticketing tools to serve as a central repository for this data, and then routes alerts to the appropriate teams and systems.

Teams can even access these alerts through new collaboration tools like Slack and HipChat, then send them to other members who can take action to quickly fix IT problems. OpsGenie’s tools—including a Web interface and a mobile app--can also manage on-call schedules and escalations.

“OpsGenie integrates with many operations tools and services, and provides flexible, easy-to-use tools to help DevOps and other stakeholders identify that critical applications or services might be down, and figuring out the right people to notify at the right time to prevent costly problems,” said Berkay Mollamustafaoglu, OpsGenie’s CEO and co-founder. “Today’s companies have invested billions of dollars in monitoring tools to detect potential IT problems, but they haven’t paid enough attention to how to smartly react to, and address, the flood of alerts they’re receiving. OpsGenie is about what happens next.”

OpsGenie was founded in 2012 and already has over 1,400 customers around the globe, including Unbounce, SUBWAY® Restaurants, Looker, Bleacher Report, Politico and HubSpot.

"OpsGenie gave my team the work-life balance we were so desperately seeking,” said Michael Irwin, manager of client services at Politico. “Before, if you were on-call, you were tied to your email. Now OpsGenie allows us to be on-call, but makes it less invasive for our team.”

Added Mike Thorpe, infrastructure squad manager at Unbounce, a leading marketing-tech company: “OpsGenie has a robust alerting and on-call management platform, which allowed us to expand and design our decentralized product-support operation the way we wanted.”

Neeraj Agrawal, of Battery, said IT-monitoring has turned into a critical, growing industry with many sub-sectors, including monitoring technologies for servers, applications, websites and databases. “We are excited to partner with Berkay and the OpsGenie team, whose service cuts across all these types of notifications and alerts to deliver critical information to teams quickly,” Agrawal said. “The company has made significant progress to date despite only modest spending on marketing—we think there is a real, pent-up demand for OpsGenie’s product, and look forward to fueling the company’s growth.”

OpsGenie has offices in the Washington D.C. area, Boston and Ankara, Turkey.

About OpsGenie
OpsGenie is an alerting and on-call management solution for development and operations teams. We provide the tools needed to design actionable alerts, manage on-call schedules and escalations, and ensure the right people are notified of IT incidents at the right time, using multiple notification methods.

About Battery Ventures
Battery strives to invest in cutting-edge, category-defining businesses in markets including software and services, Web infrastructure, consumer Internet, mobile and industrial technologies. Founded in 1983, the firm backs companies at stages ranging from seed to private equity and invests globally from offices in Boston, the San Francisco Bay Area and Israel. Follow the firm on Twitter @BatteryVentures, visit our website at www.battery.com and find a full list of Battery's portfolio companies here.


Media Contact
Megan Maxwell
GMK Communications
megan@gmkcommunications.com
650-810-6658

On-Call Toolkit

$
0
0

On Call Toolkit

What tools do you need to be an on-call warrior or hero? In this blog post we’ll examine the instruments needed to be on-call. Our goal is to provide you with on-call gear and tool ideas that help make your on-call experience successful; whether you’re an on-call novice or pro. We welcome any additional tips from our readers by responding to this post through our social media channels -- LinkedIn, Twitter, Facebook or Google+.

Attitude and Aptitude: Let’s face it…being on-call has never been pleasant, but you wouldn’t be on-call if your knowledge and expert understanding during outages weren’t needed by your organization. Having the right attitude going into any on-call scheduled rotation can help dictate how pleasant the experience can be. Also by notifying those around you upfront that being “called” is always a possibility it can help affect your attitude positively going into an on-call rotation.

Smart Mobile Device: This seems like a no-brainer, but a smart phone that allows you to review email, internet, as well as SMS messaging can make or break an effective on-call rotation. Some companies use alerting and incident management tools that effectively use mobile apps, SMS, email, and call routing to schedule and escalate alerts to the appropriate individual or team needed for the specific outage.

Smart Phones

Wifi Hotspot: A good private or personal wifi hotspot is essential for those on-call warriors who may have to travel or be on the road while on-call. An on-call road warrior, if you will. A personal wifi hotspot can be your best friend and will give you the freedom to review and respond to incidents from practically anywhere. Most mobile devices allow you to use your dedicated mobile plan to set-up or turn-on a personal wifi hotspot, which can be quite convenient. There are also individual wifi hotspot devices that you can purchase if your phone does not provide this service. These wifi hotspots are relatively cheap and along with your mobile plan can relieve you from having to search for the nearest Starbucks, library, or restaurant that has a free or paid wifi connection. Plus a private wifi hotspot device usually has a much stronger signal than most public wifi locations. Trust me, you’ll know the difference between a private and public wifi connection once it takes you 5 minutes to load a webpage only to say it’s not found.

Smart Phones

Pager: Is this still being used?

Pager

Chat Applications: Using a chat app, such as Slack, is great to quickly assess and respond to a problem or issue. Slack allows for various ticketing and monitoring system integrations to quickly get notified and allows for a more immediate response. Using an alerting and on-call management tool that integrates with your chat application can help keep these notifications in order.

Pager

Power

Alternate Power Source(s): Having an alternative power source for your mobile devices, as well as your laptop, can come in handy when on the road. There are many affordable rechargeable battery packs for mobile devices in the market...or just attend a conference, and they’ll probably give you one for free! Another great tip is to purchase a second laptop battery and keep it in your laptop bag for those moments when you need extra juice. It happens more than you think and you’ll be thankful that you planned ahead.

On-Call Team Lists: Keeping an on-call phone lists of other team(s) members and managers will help alleviate any additional anxiety when you're contacted to address an issue or problem. You may think you have this information stored somewhere on your computer or in your email, but trust me when I say, searching for this information can be a trying experience when you’re looking to resolve an issue.

Earphone

Headphones or Earbuds: These come in handy when you’re in a public area and you need to sit-in on a live bridge or conference call. Freeing up your hands as much as possible while not missing critical information can make all the difference in the world.

Earphone

Lots of caffeine: Umm, no explanation necessary. Redbull and coffee can be your best friend at times. For more healthier options try peanut butter… Of course, if you’re not allergic. But you’ll be surprised by the additional level of energy that little nut can bring.

In most cases you may think to yourself, “Well, yeah. Of course I need this stuff.” Or you may say, “Yeah, that’s a great idea.” The objective of this blog post is to open up a great dialogue on what tips, tools, and tricks of the trade you use to help keep yourself engaged while on-call. Please feel free respond to this post on Twitter, LinkedIn, Facebook, or Google+.

Alerts: Gotta Catch 'em All

$
0
0

If you haven’t noticed yet, Nintendo's new game, Pokémon GO, is literally taking over the world. Pokémon GO reinvents a classic game with an augmented reality twist. The game is not available worldwide yet; however, that has not prevented users from downloading and catching 'em all. According to a Forbes article by Jason Evangelho, Pokémon GO is about to surpass Twitter in daily active users on Android and is #1 on Google Play Store. Incredible success!

The OpsGenie team has been playing Pokémon GO, too! We just found a Chansey in our office!

Pokemon GO Pinsir and Chansey

As people spend their days traveling to Pokéstops, searching for Pokémon, completing their Pokédex, and battling at the gym, it is important for users to know if the Pokémon servers are up and running. Recently, due to the incredible number of downloads, the servers are failing and users are being redirected to screens that confirm that the application servers are overloaded.

Currently when there is a server overload, Pokemon GO has a service implemented that reports thePokémon GO server status. As of now, the service will only send emails to users once the server is back online after an outage longer than 20 minutes. If users implement OpsGenie’s alerting and incident management system, then when the Pokemon GO servers go down you can opt to receive notifications through email, text messages (SMS), phone calls, or iPhone & Android push notifications.

Create a Pokémon GO Server Status Integration

You can easily create a Pokémon GO Server Status integration by using an OpsGenie Email Integration as illustrated below.

Basically:

  1. Please create an OpsGenie account if you haven't done so already
  2. Go to the OpsGenie Email Integration page
  3. Create a new Email Integration
  4. Choose an email address, for example "pokemon-go"
  5. Write "pokemon-go@yourdomain.opsgenie.net" as your email address on thePokemon Go Server Status service
  6. Enjoy being the first one to be alerted!

Smarter Incident Workflow with New Alert & Notification Policies

$
0
0

For us, incident responders and managers, incident management is a complicated beast that requires an active effort to streamline an effective workflow to identify, analyze, and solve the incidents. Failing to notice a problem is intolerable for us, on the hand, we don't want too many alerts and notifications that cause alert fatigue and may lead to longer response times or to missing significant incidents. To prevent these two crucial needs from becoming a dilemma, we are excited to introduce to you a set of new features: Auto Restart and Alert Count Based Notification Policies.

The Story Behind the Features

Before expanding on these two wonderful features, let's start with the "why" part:

  • Alert fatigue sucks! At the end of the day, it results in longer response times. Besides, desensitization is the foregone conclusion of alert fatigue, so you can easily miss important incidents.
  • Ignorance is bliss. Thomas Gray may not have meant it, but it's true. A low-priority alert should not disturb you in the middle of the night...
  • Ignorance is not bliss, as well. Frequency matters. If a low-priority alert occurs more frequent than an acceptable threshold, it may overshadow a critical problem.
  • We should never miss a critical alert.. Any they shouldn't be forgotten, either. If a critical alert is not resolved within an acceptable period, it may mean that recipients have somehow missed their notifications. An alert may even be forgotten after someone has acknowledged it. Wouldn't it be better to restart the notifications flow in these cases?

We at OpsGenie are also incident responders, and we always ask for more/better tools to satisfy the concerns above. Therefore, these concerns have been our motto and the code word, and we've gathered suggestions from many of our customers and analyzed different approaches. In the end, we decided to add these new members to the family of Alert & Notifications Policies, that are keywords for sophisticated incident workflow and smarter alerts.

Hey! The Problem Is Not Resolved, Yet!

It's true that OpsGenie's escalations are all powerful and even repeatable after a while. However, you may want to build an alternative way for incident resolution insurance, so that you can restart the notification flow (even conditionally). Auto Restart - Notifications Policies will step in that point from now on!

As you may remember from the Alert Notifications Flow, when an alert recipient becomes aware of the alert by viewing the alert content via any of our apps, OpsGenie stops notifying him/her for the sake of not spamming. Furthermore, if someone acknowledges the alert, the notification flow is stopped for all users in most cases. However, when an auto restart- notifications policy hits, the notification flow starts from scratch and recipients start getting notifications even if the alert is acknowledged or recipients had already become aware of the alert.

A Still Tongue Makes a Wise Head

Do you have some events (or tickets) that are not urgent, but worth an analysis? Do you think that these should notify your responders if they occur more than an acceptable threshold? Count Based Notification Policies are your savior this time.

Alert Deduplication and delaying notifications are two essential tools in preventing alert fatigue. However, low-priority alerts may occur more frequent than an acceptable threshold (Regardless of count or frequency).

Using an alert count based notification policy, you can delay notifications for the matched alerts until the specified deduplication condition is met. You have two alternatives to define a count based condition:

  • Delay until alert count is X. Notifications can be delayed until the alert is received X times, hence alert count reaches to the specified value.
  • Delay until alert is received X times in Y. Notifications can be delayed until count increases by the specified value within the specified time frame (sliding window).

Put it in another way, when an alert is created, OpsGenie will not start sending notifications until the condition specified in the policy becomes true. For example, let’s say we have a policy to notify only after the alert count reaches 3. When the alert is created, since the count is 1, the condition is not met, hence no one would be notified for this alert. Once the alert with the same alias are received 2 more times and count reaches 3, the system starts the notification policy. And the behavior will be the same as new alert creation, in terms of notifications.

Similarly, if the policy is configured to only notify when alert is received 3 times within 5 minutes, the system only starts notifications if the alert is received 3 times within 5 minutes. System uses a sliding window of 5 minutes when determining the number of alerts within that time frame.

Short and Sweet: Everyone Deserves Smart Alerts

We are thrilled to introduce these new capabilities to reduce alert noise and help improve your incident response process. We will continue to strive to improve in this area, and we’re looking forward to hearing from you! Please continue to share your use cases, feedback, questions, and feature/improvement requests! You can ping us via chat on our web app or support@opsgenie.com

These features are available for the Enterprise plan only. If you haven’t yet, you can sign up for a free trial now.

The Road To Better ChatOps With The OpsGenie Add-on For HipChat

$
0
0

We, at OpsGenie, work to make integrating our product into the user’s existing toolset as smooth as possible. Which may seem like an unmanageable task since each customer has a unique model in conducting their daily operations.

ChatOps: Productive Chatting

A particular model of operation that’s on the rise is ChatOps. This model revolves around team communication and aims to unify all of the user’s tools, workflows, and processes into a single chatroom. An example might include a bot that pulls your code from GitHub and deploys it straight into one of your Amazon EC2 machines. All you need to do is type the command that triggers the bot to follow the action and voilà! Everyone on your team will also be able to follow the procedure live from the chatroom. Among other benefits, this ChatOps unified model makes operations more transparent and increases team productivity.

How does OpsGenie fit within the ChatOps model?

By providing integrations with major chat services, we allow our users to monitor and manage their alerts without ever needing to leave their team’s chat room.
Also, we recently rolled out our brand new HipChat Add-on which keeps our promise to include everything you need in your chatroom. The new add-on fully harnesses the power enabled by HipChat’s environment, which provides an easy to use API and freedom of development.

The Glance

One of the novelty features with HipChat are glances. Glances appear in the room’s sidebar and are used to aggregate and present relevant information about the add-on. The OpsGenie add-on uses the glance to display the customer’s total open alert count. The count is updated in real time as alerts are created, closed, acknowledged, or deleted. So, users can now continuously monitor the alert count in real time from their chat room while managing other issues.

The Sidebar Views

Another distinguishing feature of HipChat add-ons are the sidebar panel views. We use these panels to display the list of user alerts, similarly to the Web and mobile UI’s. Users may filter through listed alerts and can select alert’s to view more detailed information, as well as, execute operations such as Acknowledge and Close.

The Bot

Of course, in the spirit of ChatOps, the /genie bot commands cannot be missing from this integration. The /genie bot provides powerful operations that can be executed simply by typing in the related command.

The Cards

Last but not least, users receive notifications of any OpsGenie activity through cards posted in real time in the HipChat room. These cards include the name of the user that executed the operation, the alert message, tags, a link to the alert on OpsGenie, and a link that opens the alert details in the HipChat sidebar panel.

Check out the documents page for more details, then go give the OpsGenie HipChat Add-on a try!

From our ChatOps to yours

$
0
0

At OpsGenie, we do our best to be punctual and early adapters of new features released by the companies we integrate with. We want to stay flexible while providing a complete workflow and feature set.

Not so long ago, Slack introduced,“Interactive Buttons,” which OpsGenie had a great tested use case for! So, we started working on it. As an early adapter, our previous Slack Application (App) was already part of the directory; which led us to build a brand new Slack App.

Add to Slack Button

First, we implemented an “Add to Slack” button for easy installation of our app. Now, our users can easily create an integration and are no longer stuck copy and pasting API Keys!

Slash Commands

We have always supported most actions through Slash Commands. Now, we’re supporting a new command, /genie connect, as a part of our new Slack App.

The /genie connect command is used to execute alert actions as your OpsGenie user in Slack. Admins/team admins can select this option as mandatory by using the “require matching user” option on the OpsGenie Integration page. The Chat User Mapping docs page has more information regarding this feature.

Incoming Webhooks

We use Slack’s incoming webhooks feature to send messages to Slack in real time. When an OpsGenie alert action executes, a brief message is sent to Slack through the incoming webhooks.

Interactive Buttons!

Interactive Buttons is a killer feature on our new Slack app. Now, OpsGenie users can use interactive Slack buttons to acknowledge, unacknowledge, or close alerts. Results return to the users in near real time. We also mention Slack users who execute actions so that others can see the user easily. Other than updating previously clicked messages, we send a new message to summarize and inform users of the action.

New interactive Slack buttons are particularly ideal for non-technical users. However, for more technical users, we provided an option to not use Slack buttons.

Detailed information is available on our Slack App Integration docs page. We would love to hear your feedback and we are open to new ideas. Feel free to chat with our Customer Success Team about the new Slack App on OpsGenie.com!

Build Your Teams and Fight Alert Fatigue!

$
0
0

In today’s world, most organizations use a team-based structure. With this, organizations strive to define responsibilities, build the right skill sets, distribute workload, and eventually maximize productivity and success.

At OpsGenie, we care about our customer’s flexibility to adapt our software to their organizational needs.

How to build and manage Multi- Team collaboration concepts in OpsGenie?

OpsGenie Teams provides customers a unique “team organization” feature that helps establish and manage company organizational structures from within. It allows different departments, such as Operations, Development, Database, Q&A, or Customer Success, the opportunity to effectively collaborate together to assure quality and fast incident resolution processes. Similar to the real world, with OpsGenie, it is possible to define diverse teams while at the same time assign an individual to multiple teams for quick and effective issue resolution.

Roles matter

You can break down your team’s and team members responsibilities based on their skills by configuring their rights and roles. It is essential that issues are escalated to the correct person through proper means to help them focus on mission critical tasks. So, don’t forget to respect each team member's alert notification preferences, whether it is SMS, mobile push, voice call, or e-mail.

user roles, user permissions, user rights, administrator rights, edit user roles

Track the issues across the globe

Are your teams geographically distributed, and should you worry about the time zone differences? Set your team members’ schedule according to their locations and time preferences. Don’t leave any issue unresolved due to time-zone miscalculations.

Schedule your team shifts to sleep through the night

Tired of waking up in the middle of the night? Was the call not meant for you? With OpsGenie you will be able to escalate issues based on team rotation schedules.

escalation policies, escalation policy, define escalation policy, issue escalation policy, escalation process, IT escalation policy

Don’t lose track of issues and improve your team’s performance with OpsGenie’s incident management and team collaboration platform! OpsGenie is equipped with all the required tools and capabilities to reflect your plans and decisions by addressing all questions and concerns.

Shift scheduling, schedule rotation, on-call scheduling

Focus on the right roles

If you have a big team, to work more effectively, you may choose to notify only some members of your team at certain times and expect them to resolve the incidents at hand. Thus, other team members are not notified with alerts that are not relevant to them.

Find and fix current bottlenecks

If your whole team, or say your department manager, still needs to view each issue, they can see the history of incidents as well as their resolution process, which in turn will help develop the necessary procedures and policies to assure higher quality and efficiency in alerting notifications.

Gain full visibility of your team's activities

OpsGenie provides full visibility to all team members, while giving you the option to notify only a subset of them, based on the schedules, escalations, and notification preferences in your configuration. Similarly, with OpsGenie, you can define which team(s) should be notified of incidents arising in a certain service, thus, allowing each team to focus on their tasks and minimizing the noise people would otherwise experience.

IT Teams, DevOps team, Q&A team, Database team, Application team, Support team

Start your free trial now! Do you have any questions? Check out our documentation page and do not hesitate to contact our Support team!

You can also find our latest “teams” update at: https://news.opsgenie.com/#221


New OpsGenie Configuration Management Tool

$
0
0

New OpsGenie Configuration Management Tool

Backup is key in information technology. Data loss may be irreversible causing huge inconveniences to people and companies. When you have a large number of administrators who constantly make changes to the system, the risk of making incorrect actions is high.

Have you ever lost data with an accidental click? Was your hardware corrupted or your PC infected by viruses and cyber attacks which led to data loss? Research validates that almost 50% of people do not backup their data and then uselessly tries to recover it.

Backup utilities are crucial not only to avoid data loss but also to be able to replicate configurations for different usages. For example, if your organizational structures change quite often, you may want to apply a saved configuration of a system or service to a new one. This will allow you to easily create new accounts for changing user roles, new team memberships, changing timezones, etc.

Backup, Import, Export, Recovery, Restore

To save time and effort as well as to build a trustworthy company, OpsGenie introduces a new Configuration Backup Tool to save your OpsGenie configuration data, which can be used after an unintended configuration change or future organizational structure change. It’s cool, right? There is also an extra capability: you can restore a specific configuration area and not the whole account data with OpsGenie’s new backup feature. From now on, all OpsGenie customers can backup and restore their accounts and configuration such as:

  • Username
  • Timezone(s)
  • User roles
  • Teams
  • Schedule
  • Escalation information
  • Heartbeat data
Why do you need this?

Free your time

Configuration backup can free up your time by not forcing you to set up all your account information and configuration from scratch. Now, if you accidentally delete any data or you just need to replicate or mimic an old configuration you will save time by not building out another configuration from scratch.

Make a partial recovery

If you make a change in a specific area of your OpsGenie configuration (i.e. just change the schedule or team membership), and later you decide to go back to your previous configuration, you do not need to worry about how to revert to your old system. You can use the OpsGenie Configuration Backup tool to restore only that part of the previous configuration, and keeps the rest up-to-date.

Find out more!

For source code and examples, please visit our GitHub repository. You can download the executable from here.

Sign up for OpsGenie’s 14- day free trial and learn, hands on, about OpsGenie’s configuration backup tool!

Developing Custom Solutions Using OpsGenie APIs

$
0
0

At OpsGenie, we do our best to enhance our customer's experience using our platform. We consider their needs and develop features that will promote efficiency and growth.

We believe that our product is already feature-rich, yet we’re continuing to further develop it to promote customization and flexibility. Our team works daily to continue product enhancements; and they’re always ready to extend it with customizations, as you need.

We have a Web API and Software Development Kits for Java, Python, and Golang programming languages as well as Node.js environment. Isn’t it great that you can develop your own tools yourself as an extension of OpsGenie’s product out-of-the-box capabilities?

In this blog post you’ll learn how to use our Web API and add to our product capabilities.

OpsGenie Web API

Our Web API lets you interact with alerts from anything that can be sent through an HTTP request. You can accomplish almost everything you do via our Web UI with our Web API. For example, you can create, acknowledge, close, delete, list alerts; create, update, delete, list teams, and do even more programmatically through our Web API.

Developing Your Custom Solutions

We love new customization requests from our customers! It's an exciting challenge and allows us to utilize our product’s flexibility while developing these new solutions.

Once, a customer of ours explained that they needed to have multiple teams working on their time sensitive issues and asked if it was possible to create an alert per team for such issues. Challenge accepted! We examined the request and developed a script that works under the Amazon Web Services - Lambda Service using Python programming language. Let us explain….

What we did

Step 1. Our goal was to first specify whether an alert should be replicated for multiple teams. If so, we considered the initial alert as the “root alert” and created a “sub-alert” for each team. Hence we created an extra alert property named, “teamsToNotify.” As the value of “teamsToNotify”, you can assign team names for each sub-alert (this needs to be comma-separated). Sub-alerts should also have a "rootAlertID" field in the extra properties when they are created. This "rootAlertID" field is the ID of the root alert, and is used to build a parent-child relationship with the root alert and its sub-alerts.

Step 2. We created a Webhook integration to invoke the AWS Lambda script. To forward the root alert and teams to notify information to the script, we then added the conditions listed below as a filter to the integration:

  • Extra Properties |Contains Key| teamsToNotify.
  • Extra Properties |Contains Key| rootAlertId.

Once you complete the integration, if you create a new alert with a "teamsToNotify" key and team names as its value (e.g. "team1, team2, team3"); the integration will send this to the script. Here, the script creates a sub-alert for each team specified in the "teamsToNotify" field. The script also adds the Alert ID of each sub-alert to the root alert as a tag.

How does it sound so far? Cool, huh?

How about the interactions between a root alert and its sub-alerts? Did you question what would happen to the sub-alerts once the root alert is acknowledged? Or, what would happen to the remaining alerts if one of the sub-alerts is closed? Great news! All can be configured in many different ways depending on the logic you need.

Here is an example of what we did for a specific customer…

When a sub-alert is acknowledged, the script adds a comment to the root alert such as "User X acknowledged the alert for team Y" to indicate that a team has acknowledged the alert created for them. When a root alert or a sub-alert is closed, we consider the issue fixed and close all sub-alerts and the root alert.

Efficient, right? Please let us know what you think. Give us your feedback and continue to challenge us with your feature requests!

Sign up for OpsGenie’s 14- day free trial and see how OpsGenie is armed with all the tools and flexible configuration options to fulfill all of your alerting and incident management requirements!

AWS re:Invent 2016 Customer Appreciation Event

$
0
0

Hello, Readers!

Berkay here! Early this week, OpsGenie had the opportunity to host a very exciting, Customer Appreciation event. During the week of the AWS re:Invent conference we opted to foster relationships outside the conference. The OpsGenie team flew into Las Vegas on Monday night, and boy are our arms tired. Ok. Bad joke. :)

Anyways, the next morning we made the most of our day by beginning it with a company breakfast at the MGM Grand Hotel, where we discussed business and logistics for the event. But first, we took a walk along the strip for some fresh air and adventure. It was a beautiful day in Vegas!

At around 2 PM, we headed to the KÀ theater at MGM Grand to set up and explore before our guests arrived. The theater was breathtaking, amazing, spectacular... I could literally (figuratively) go on. About an hour later we opened up the OpsGenie branded theatre and were able to meet with some of our loyal customers. Once we had a chance to say hi, they were escorted down to the theater area. The OpsGenie customer appreciation experience began with an interesting Q&A tailored toward engineering aspect of Cirque Du Soleil as well as a tour of the KÀ theater. The OpsGenie team alongside our loyal customers were immersed in the ingenuity and history of Cirque Du Soleil. We were able to get an up-close and personal once-in-a-lifetime experience.

These are pictures outside and inside the theater before the amazing backstage tour. Once the tour began, we were all mesmerized by the precise engineering of each addition to the show.

Especially the Sand Cliff Deck which was a movable platform that the Cirque performers dance on. It’s supported by a gantry crane that supports and controls it. The Sand Cliff Deck weighs 50 tons! This deck is touch sensitive and equipped with rods that allow performers to climb it while vertically tilted. To show us the immensity of this screen they invited us up onto the stage which descended about 50 feet into the underbelly of the theater.

Here’s a picture of us looking down at the stage being lowered.

And here’s a picture of us being lowered.

And then… it lifted and started moving.

After such a “WOW!” experience, we went across to Wolfgang Puck for a happy hour and dinner. During the happy hour, I was able to meet face-to-face with customers that I’ve worked so closely with throughout the years. After much wine and food from Pucks, we wandered over to the show which was about 50 paces to the left. The official Cirque Du Soleil show started at 7 PM, and once we took our seats, we were enthralled with a beautiful story about a young man and woman.

This OpsGenie event is the highlight for us at AWS!

OpsGenie Customer

Cirque Du Soleil and the Wolfgang Puck happy hour was OpsGenie's first “Customer Appreciation” event, but it certainly won’t be the last. I am humbled by the outcome of the event and everything that went into it. I want to thank the OpsGenie team, Cirque Du Soleil, and Wolfgang Puck for all the work they put in to make the show and happy hour as successful as it was. I also want to thank you all for taking the time to come out and experience such a legendary event with us.

Here are a few more pictures from the night:

Incident Resolution via Virtual War Rooms

$
0
0

As incident arbitrator, we all know how complicated and stressful it is to work on time-critical incidents. Every extra minute your team spends on resolving an incident is valuable and may have a devastating impact on your business and customers. So what can you do to minimize the time, effort, and the stress related to major incidents?

First, establish a clear process for incident resolution. Set up a process for incident resolution and share it with your team to make sure everyone is on the same page and that they know how to execute it.

Second, make sure you have an out-of-the-box incident notification system. Ensure that you are using a reliable and targeted incident notification instrument set, *cough* OpsGenie *cough*. Also confirm that your team gets notified with the relevant details as soon as an incident arises. Now it's your team’s turn to follow the published processes; use the data at hand and resolve the issue promptly. The key to quick incident resolution is effective communication channels.  In order to support these golden rules while helping you effectively communicate during the incident resolution process, we are proud to announce our new Conference Bridge feature!

What is the new OpsGenie Conference Bridge Feature?

2.png

It is a single “location” where key individuals collaborate to resolve any issue and ensure its successful completion quickly.  With the OpsGenie Conference Bridge feature, you are now able to quickly set up Virtual War Rooms to hold an all-hands-on-deck to promptly resolve your critical incidents. This is beneficial especially when your team members are remote (which is a probable situation in the middle of the night, right?). After an incident arises, your IT Operations Managers won’t have the vexing experience to set up Conference Bridges while at the same time trying to organize team members across multiple locations. Instead of confusing your team with which bridge to use (we’ve all been there); your team can now access the preset conference bridge details within the incident notification and join the call with just one-click. It’s that easy. 


How does the OpsGenie Conference Bridge work?

Your OpsGenie account owners or admins can set-up in advance as many Conference Bridge Rooms as they want. Then, they can specify which room to use under which conditions using the Conference Bridge policies. As alerts are generated, if they match any of the conference bridge policies, the related Conference Bridge Room details are automatically attached to the alert notification.  There you go! You are all set! It is up to you to join the conference bridge via your preferred device such as your mobile phone or laptop! Oh…. One of the best parts is that you are not restricted to a specific web conferencing tool. We support any conference bridge tool you choose. Exciting!

Benefit

  • Real-time collaboration within the teams
  • Effective communication for fast incident resolution
  • Easy setup and access to incident data
  • Single visibility to the incident
  • Incident resolution process on the go from any device, anywhere, anytime.
4.png

Start your free trial of OpsGenie and the Conference Bridge feature now! Questions? Check out our documentations page and do not hesitate to contact our Support team!



6 New Year’s Resolutions for Incident Management

$
0
0

12.png

It is that time of the year again! New hopes and motivations to improve our lives for the better! Yes, we are talking about New Year’s resolutions. :) Aside from your personal resolutions, we would like to focus on the positive changes you can make to your professional life.

For those of you who already benefit from an incident management solution (like OpsGenie!), below are a few suggestions to welcome in the new year with a higher quality of life (and incident management solution)! And yes, you can do it!!!

11.png

#1: Build your solution around teams: You can reflect your organizational structure in OpsGenie by routing the notifications based on team rotation schedules (instead of waking everyone up in the middle of the night!). This will help your team from suffering with alert fatigue, sleep disorders and burnout! Finally, don’t forget to take advantage of our cross-team alert routing capability and boost collaboration!  

#2: Enrich your alerts with meaningful information: Here are a few suggestions:

  • Attach server logs to your alerts
  • Compose metric graphs through performance monitoring tools
  • Attach runbooks to your alerts

These changes will help centralize relevant information and will eventually decrease your MTTRs.

#3: Use alert de-duplication: By using alert de-duplication you won’t receive multiple notifications for the same issue and you can choose the de-duplication level using the alias field.

#4: Build smarter workflows with alert policies: You can utilize OpsGenie’s policies; such as Modification Policies, Notification Policies, and Auto Restart Notifications Policies to suppress certain notification types. Distinguish between low and high urgency alerts while never missing a critical alert!

#5: Monitor your monitoring tools using OpsGenie Heartbeats: What if you're unaware of a problem within your monitoring tool(s)?  Dont risk it! Take the necessary precautions and monitor your monitoring tools. Send periodic heartbeat messages from your monitoring tools to OpsGenie, and if we don't get them, we'll let you know!

#6: Analyze, Analyze, Analyze: Visualize and analyze incident management operations and track trends to improve your teams as well as your system effectiveness by taking preventative actions.

What’s your incident management plan in 2017? Tell us how OpsGenie can help you achieve these resolutions!

We wish you all a Happy New Year! See you all in 2017!

Viewing all 204 articles
Browse latest View live