Unify Your Incident Management Process With the Fundamentals

xMatters

Read Related Content: Do I Need a DevOps Toolchain?

In a perfect world, technology stays on and runs flawlessly. But we all know this isn’t the case. Like any organization, xMatters  sometimes experiences unplanned incidents. What we can control is how we respond to them. To resolve incidents quickly, it’s important to coordinate an organized response.

Your Incident Management Process Needs to Evolve

xMatters decided long ago that a well-developed incident management process was critical to supporting customers. Over the years, our architecture has evolved from on-premise to hosted data centers, to the cloud. This rapid evolution meant our incident processes and workflows needed to evolve as well.

To streamline our business, we started reviewing our incident management process to align with our architecture. The exercise was designed to build on what we already had, integrate new concepts, and make any necessary adjustments. Here are the steps we took and the lessons we learned in developing the latest version of our incident management workflow process.

Unify Your Incident Management Approach with Five Fundamentals

Five fundamentals of incident management. Click for a larger image

Invest Time Upfront

Initially, we found it challenging knowing where to start – incident management isn’t something that can be done in a single meeting. Preparing an incident management plan requires thought, willing partners, collaboration, and the time to build it properly. Fortunately, xMatters understands the importance of a thoughtful, cross-functional plan. Each organization has unique requirements, so it’s important to revisit the fundamentals before getting started. We began by reviewing goals, stakeholders, structure, severity classification, and tools. This high-level audit allowed us to achieve a clear picture of the work needed to update our incident management plan.

Fundamental 1 – Define Your Goals

It may seem obvious, but defining what you want out of your incident management process is the first step:

Speed – Minimize customer impact by quickly identifying potential problems, bottlenecks, and opportunities to improve response times.
Efficiency – Consider whether there’s anything in the process preventing the team from hitting the ground running. Does something have to be added or removed from the process?
Ease – The last thing anyone wants is a convoluted process. Review past incidents to determine what information is commonly needed and what playbooks, documentation, or references are used to help make the process as easy as possible.
Clarity – Define roles, playbooks, and timelines. The less a team has to organize on the fly, the more likely they are to succeed.

Fundamental 2 – Identify Stakeholders

We looked at various groups that play a role in incident management and resolution. They were polled for their thoughts on what was good and what could improve in the process.

Incident Initiators – Identify what knowledge and tools teams initiating an incident need. What have been challenges in the past?
Incident Lead – What does the team organizing the incident need to know? Do they feel comfortable in this role? Are there any blockers?
Subject Matter Experts – Is there any specific information SMEs need when they become involved in an incident? What environment is best for them?
Leaders and Third Parties – When do organizational leaders need to be informed? What type of information do they need? Who should be part of these groups? How do we keep the organization informed without impacting the resolution process?

Fundamental 3 – Determine Incident Structure

We use the Incident Command System (ICS) concepts to organize an incident. We quickly found we didn’t need to adopt all of the ICS to be successful. Instead, we took the components that applied to our goals and used them as a guide for our incident management process.

Roles – This information allowed us to define the roles needed for our incident management and helped us determine who could take them on; and if we needed to incorporate any other functions into the plan.
Training – This review helped us identify training opportunities for our incident teams.
Post-Mortem – We also determined which details of an incident were best to capture to complete an effective post-mortem and root cause analysis.

Fundamental 4 – Clarify Severity Terminology

Structure – Part of building incident management was clearly defining the criteria for each incident type.  We reviewed our existing severity structure and adapted it to take SLAs, scope of response, and our customers’ needs into account.
Classification – It’s important to classify severity early in the incident management planning process, especially when incidents impact customers because it guides the design of the broader plan.
Teams – Match severity to the required response. It may make more sense to gather a small team with the needed skill set than to incorporate every team member. Sometimes large teams can be noisy and can impact efficiency.

Low severity issue

Major incident

Fundamental 5 – Identify Tools

Tools – xMatters, Slack, Jira, Zendesk, and our monitoring tools are all part of our incident process. The intention of the review was to understand how each tool fit into an incident and see whether  there was an opportunity to improve its use. For example, xMatters is central to connecting our tools and notifying our teams. We reviewed our use of xMatters and identified areas where we could improve.
Process – All tools went through this process before we reworked our plan, allowing us to see how teams used tools and identify areas for change, improvement, or workflow automation.

Workflows from the real world
Please check out the workflows we use for our customers.

Request a demo

xMatters service reliability platform

Automate workflows, ensure applications are always working, and rapidly deliver products at scale.

Platform Overview

Everbridge Digital Operations Platform

Keep your services running

xMatters unites teams to identify and resolve issues quickly. See how we can help yours.

Solutions Overview

xMatters YouTube

Catch our latest webinars, customer stories, and support videos on the xMatters YouTube channel!

Watch YouTube

Unify Your Incident Management Process With the Fundamentals

Your Incident Management Process Needs to Evolve

Invest Time Upfront

Fundamental 1 – Define Your Goals

Fundamental 2 – Identify Stakeholders

Fundamental 3 – Determine Incident Structure

Fundamental 4 – Clarify Severity Terminology

Fundamental 5 – Identify Tools

You May Also Be Interested In

The Future of Incident Management: Your Bl...

Evaluating PagerDuty Alternatives

Bridging the Gap: How xMatters Aligns with...

Unify Your Incident Management Process With the Fundamentals

Categories

Your Incident Management Process Needs to Evolve

Invest Time Upfront

Fundamental 1 – Define Your Goals

Fundamental 2 – Identify Stakeholders

Fundamental 3 – Determine Incident Structure

Fundamental 4 – Clarify Severity Terminology

Fundamental 5 – Identify Tools

You May Also Be Interested In

The Future of Incident Management: Your Bl...

Evaluating PagerDuty Alternatives

Bridging the Gap: How xMatters Aligns with...

Your Incident Management Process Needs to Evolve