What Makes a Perfect Incident Management Checklist? We Asked the Experts!

Hollie Whitehead

The perfect incident management checklist doesn’t need to be a fantasy. In fact, it shouldn’t be! The perfect incident management checklist should cover several topics, be broken down into bite-size sections, and help team members quickly identify tasks that fall under their responsibility.

We asked our experts what should be included in the perfect incident management checklist. Here are their answers.

Don’t Overlook the Basics

An incident management checklist should always start at square one, and for most users, that’s an internet connection. According to our Senior Frontend Developer, you should always have these basics ready to go:

A good headset
A decent network connection
Proper authentication allowances
Awareness of what to check and where assets are located (e.g., links to logs, dashboards, and playbooks)
Easily accessible scripts or dashboards that can quickly show high-level status
Playbooks that define what to do in certain circumstances

Clear Roles and Responsibilities

During an urgent incident or crisis, you don’t want to spend time deciding who should be responsible for certain tasks. An incident management checklist that clearly outlines roles and responsibilities can be a huge time saver, but what else should be in this section? Our Engineering Team Lead suggests:

The inclusion of an on-call stakeholder with the authority to make necessary choices to ensure resolution is possible
The inclusion of communication leads for the creation of internal and external messaging
The inclusion of on-call subject matter experts for every incident

Actionable To-Do Items

Once the administrative work is covered and the right people are in the right seats, it’s time to begin the to-dos. Specific actions may be dependent on the incident itself, but almost every incident management checklist requires the following actions, outlined by our Team Lead and Senior UI Developer:

Role assignment

Inclusion of needed experts and stakeholders
Communication of the incident status to the customer
Inclusion of standard failover questions at certain timeframes (e.g., at 15 minutes, should we fail over?)
Summarization of the incident after the issue has been mitigated
Postmortem scheduling

The Specific Specifics

Whether it’s an attachment to your incident management checklist or page two, the during-incident specifics need consideration: this includes note-taking, root cause identification, and so much more. Our experts suggest identifying and recording:

The date and time of the incident
Applicable first responders
Systems at fault
Incident severity level
Incident blast radius
Estimated resolution date and time
Planned communication rollout
If the incident was repetitive (if yes, refer to previous incidents)
If multiple incident reports were present (group if any)
A current snapshot of health check systems, relevant monitoring, and logging systems
Possible playbooks that can revert systems
On-code and deployment issues
The timeline of events starting from the discovery of the incident to the resolution
Relevant postmortem information (e.g., if the postmortem is planned and if meeting minutes and action items are attached from the incident postmortem to take preventative steps for the future)
The updated service track record with the attached incident, and set incident without days to zero
The stakeholders updated as per the communication rollout plan

Incident management checklists are always a work in progress. However, something that should always be part of your incident management process is a service reliability platform capable of helping you automate your response, integrate your tool stack, and accelerate your entire incident management process. Try xMatters for free and learn how xMatters can help.

Request a demo

xMatters service reliability platform

Automate workflows, ensure applications are always working, and rapidly deliver products at scale.

Platform Overview

Everbridge Digital Operations Platform

Keep your services running

xMatters unites teams to identify and resolve issues quickly. See how we can help yours.

Solutions Overview

xMatters YouTube

Catch our latest webinars, customer stories, and support videos on the xMatters YouTube channel!

Watch YouTube

What Makes a Perfect Incident Management Checklist? We Asked the Experts!

Don’t Overlook the Basics

Clear Roles and Responsibilities

Actionable To-Do Items

The Specific Specifics

You May Also Be Interested In

How Native Process Automation and Auto-Rem...

The Future of Incident Management: Your Bl...

Evaluating PagerDuty Alternatives

What Makes a Perfect Incident Management Checklist? We Asked the Experts!

Categories

Don’t Overlook the Basics

Clear Roles and Responsibilities

Actionable To-Do Items

The Specific Specifics

You May Also Be Interested In

How Native Process Automation and Auto-Rem...

The Future of Incident Management: Your Bl...

Evaluating PagerDuty Alternatives