Uptime Blog


What Is Ultra Modern Incident Management?

Travis DePuy ON Jun 25, 2019

Spreadsheets, whiteboards, and desk phones are artifacts of ancient incident management. Many organizations have evolved into modern incident management, marked by on-call scheduling automation. What’s next? In this blog, Product Advocate Travis DePuy describes ultra-modern incident management: automating any process to save time, money, and your sanity.

The Ancient History
I started my career at Peregrine Systems, doing 2nd level technical support for Service Center which was an application for your standard ITIL process such as Incident, Change and Request Management.

As a support engineer, we were on-call for priority issues and we managed who was on-call when using “The Sheet.”

The dreaded on-call sheet

The dreaded on-call sheet

Anyone who did on-call work in the early 2000s is probably pretty familiar with “The Sheet.” It was one Excel sheet residing on a shared drive and was the source of record for the on-call schedules by hour, day and week. And man, was it a nightmare. People would leave work and forget to unlock “The Sheet” without Saving their changes, and we would need to work off an old copy and hope for the best. Invariably, someone would call in sick or go on vacation, and “The Sheet” would still have them in rotation. Of course, we only discovered this when they didn’t answer the phone after three tries and someone mumbled something about them being in Hawaii.

In the timeline of an incident, every moment is critical and organizations can’t afford to burn through 20 minutes before even starting to triage.

Often the incident involved more than just one team, so then we’d have to track down a person from the DBA or Network teams. These teams had their own versions of “The Sheet,” and the effort to find an on-call resource would begin again. Eventually, the teams would be assembled.

Ancient incident management

Ancient incident management

As the timeline progressed, the teams would resolve the issue and work to restore service. In hindsight, I would call this Ancient Incident Management.

The Modern Era
Later, when I was introduced to xMatters, I found a much more resilient tool for scheduling on-call resources. As pagers started dying off and the world moved to smart phones, the efficient ways of tracking someone down or delivering information helped to reduce the incident timeline even further.

Cutting out much of the manual effort brings incident management into the modern era, and so we call it Modern Incident Management. With modern tools, teams were able to drastically cut down on the time to engage the right people and assemble representatives from different teams, both of which affect the impact and timeline of an incident.

Modern incident management

Modern incident management

This is all well and great, but today is a different time. Digital transformation is accelerating all aspects of the business, and the incident management process is one of the most critical. This process needs to be continuously modernizing as tools mature and tasks become more automated. This is critical for maximizing the uptime of lines of business.

Ultra Modern
On-call schedule tools aside, how many other tools does your incident process touch?

Once the monitoring tool triggers an alert, what happens then? Do you paraphrase the alert to open a Jira issue? How about a ServiceNow incident? Does your process require both?

Then if it is severe enough, do you click the friendly + icon to create a channel in Slack, followed by an @ mention to the relevant people? Do you know who is currently on-call?

Do you open up your Statuspage application and enter the relevant details? Do you write up an email informing the Stakeholders who have the authority to change your employment status?

With so many teams to keep informed or collaborate with, the process might look something like this. People manually shuffle data from one place to the next and spend time that could be better spent on more productive steps.

Manual Touch Points — people pushing clipboards.

Manual Touch Points — people pushing clipboards.

How much time does updating all these tools take? 5 minutes? 10 minutes? More? Again, this is all time that could be better spent supporting the troubleshooting efforts or preparing the post mortem.

Automating manual steps will cut even more time out of the process, helping smooth the rough edges and ensuring consistency. So instead of all the human touch points, you now have automated touch points, indicated by the (x) icons in the diagram below.

Automated touch points

Automated touch-points

I’m terming this Ultra Modern Incident Management. It is another melding of human process with automated steps and can further cut down an incident lifecycle.

Ultra-modern incident management.

Ultra-modern incident management.

Evolving out of the ancient world and the use of spreadsheets brings the process into the modern times with modern tools. Further evolving those processes means keeping the tools up-to-date with the processes they support and in the days of many diverse tools across many teams a new paradigm is needed: Ultra Modern Incident Management.

Automate the things that can be automated so you can get back to doing something creative. Learn more about keeping your digital services available. Try xMatters Free today!