Uptime Blog

Who Should Be On Your Incident Response Team?

When an incident strikes, an organization’s reputation and revenue, as well as customer trust are at stake. Assembling an effective incident response team is critical to minimizing the incident’s impact. But what exactly is an incident response team? Who should be a part of the team and what are their responsibilities?

Successful incident responses require a team with a diverse set of problem-solving and communication skills. It’s crucial to have a comprehensive incident response team because they’re responsible for coordinating among themselves, communicating complex topics to stakeholders, and documenting the incident for postmortems to mitigate future risks, all under stress.

Let’s explore the fundamental roles necessary to create a high-performing incident response team, their responsibilities, and the challenges they could face.

The Incident Response Team

Incident response processes differ from organization to organization, depending on the tools and nature of the services. For instance, here at xMatters our incident response process has many automated steps, manual troubleshooting, and documentation requirements to deliver a complete incident response.

But, a company in a different industry could have an entirely different process. Financial companies for example may need to immediately notify regulators of an incident that compromises customer information, no matter how small. Regardless of what the processes are, the roles required to detect, respond to, and resolve an incident are typically the same.

To complement an organization’s response processes, a well-rounded incident response team with the following vital roles is necessary:

  • Team Lead
  • Engineering Lead
  • Subject matter experts
  • Communications Manager
  • Scribe
  • Legal and Human Resources representatives

Each of these roles has a crucial part to play. However, sometimes one person fills multiple roles or multiple people share a role. You should clearly define and fill each position based on the incident’s size, scope, and severity.

Team Lead

The incident response Team Lead, also known as the Incident Commander, is the overall manager of an incident. It’s one of the first roles you should define before the incident occurs. The Team Lead is responsible for:

  • Coordinating all incident response
  • Forming the incident response team to be ready when an incident occurs
  • Communicating roles and requirements with the necessary team members
  • Defining the incident’s overall response strategy

The Team Lead is similar to a project manager, ensuring the incident response is organized, planned, and executed quickly. The ideal candidate should have specific skills, including:

  • Persuasive communication and negotiation skills to ensure the incident response team works towards a unified goal
  • Effective time management and planning skills, so responses are complete and timely
  • Strong leadership skills to effectively guide and motivate the team, especially under high-pressure, stressful situations
  • Critical thinking skills to navigate through tricky situations and ambiguity and to make crucial decisions
  • Relevant technical skills to understand and oversee an incident

The Team Lead needs to set the tone for the response and empower team members to resolve incidents as efficiently as possible.

Note that the team lead may change from incident to incident, depending on the incident type and the leader’s area of expertise. Define in advance which leader (and team) should respond to each incident type your organization will likely encounter.

Engineering Lead

The Engineering Lead role is the incident’s technical owner and key resource responsible for diagnosing the problem and proposing and deploying any solutions to resolve the incident. An ideal candidate should have specific skills, including:

  • Extensive technical skills on the system or systems affected by the incident
  • Broad knowledge of other systems within the organization, particularly those that directly rely on or integrate with the affected system
  • Strong communication skills to work with and coordinate other technical team members required to find and resolve the incident

Similar to the Team Lead role, the Engineering Lead should be one of the roles you define before an incident occurs, then choose the appropriate lead (and team) depending on the affected systems.

The Engineering Lead role is typically filled by a single senior technical employee with a deep knowledge of the system or systems affected by the incident. For large incidents that involve multiple systems, the Engineering Lead role may be shared by a group of senior team members, each with the expertise of a different system. However, for smaller incidents, a single lead with knowledge of multiple systems will often suffice, as this lead can lean on subject matter experts to help with more complex troubleshooting.

Subject Matter Experts

While the Engineering Lead role is responsible for diagnosing and resolving an incident, they don’t do it singlehandedly. They call in subject matter experts (SMEs), either technical or functional, to help diagnose a problem and implement fixes to resolve the issue.

These specialists have a high-level working knowledge about the specific product or service the incident affects. This group can include technical staff with a working knowledge of how the product or service is delivered, and functional staff who understand how end-users consume the product or service.

Having subject matter experts on the incident response team helps ensure systems return to the previous working state. These SMEs also ensure no other issues occur as a result of the implemented solution.

Communications Manager

While the Engineering Lead and subject matter experts focus on finding, diagnosing, and resolving the issue that has caused an incident, the communications manager handles the internal and external communications. This role removes the burden from the technical staff of communicating information to a wide variety of people.

The communications manager uses their communication and negotiation skills to relay and disseminate information to key stakeholders, such as:

  • Communicating outage and incident information to PR teams
  • Responding and interacting with users of the product or service
  • Talking to key stakeholders about the impact and estimated times to resolve the incident

While the team is working on the incident, these stakeholders may not necessarily be part of the incident response team. But the team may still need their input. It’s critical to keep these stakeholders in the loop with the current status. The communications manager keeps interested parties informed during the response and relays postmortem activities and underlying cause information after incident resolution.

Scribe

The scribe role records information during the incident, including the actions and time taken to resolve the incident. Postmortem teams can analyze this record of events to find opportunities to improve the incident response process or avoid future issues. The person in this role should be detail orientated and should work independently of the other positions as much as possible. It is important that the person assigned to the scribe role is mutually exclusive of any of the technical roles. No matter how small your incident, the person who is under pressure and working to mitigate and resolve the incident should not be required to record the events. This helps ensure an accurate and impartial log of events.

The incident response team’s scribe role can be used to initiate intermediate or junior technical employees into your incident response process. A project manager, the project’s communications manager, or a technical writer may take on this function.

Legal and Human Resources

Depending on the nature of the incident or issue, sometimes legal or human resources members need to provide input and guidance. For instance, data breaches or sensitive information exposure may require legal expertise to craft or edit communications. Incidents regarding staff misuse or failure to follow policies may require human resources intervention. Without legal or human resources as part of incident response teams, these issues can cause more harm later.

Conclusion

Incident response tools and processes are critical for IT operations. Without a cross-functional incident response team comprising a diverse set of skills, your organization may not handle incidents as effectively as possible.

Strong team and engineering leads, supported by subject matter experts, drive quick and effective incident resolution. Investing in a communications manager and scribe, and ensuring legal and human resources are involved, round out an incident response team. Assembling these individuals can significantly reduce the risk of repeat incidents and mitigate other issues occurring after incident resolution.

When your incident response team is in place, ensure they have the right tools to respond efficiently. To learn more about incident management and automation tools to streamline your IT operations, check out our product page!

Try xMatters today!