Discover why PagerDuty users are switching to xMatters. Listen to insights from Ben Narramore, Director of Global Operations at PlayStation.Watch webinar

Uptime Blog

Create Better UX with Incident Response and Service Intelligence

Incidents that impact user experience are some of the most common challenges that IT, security, and operations teams must face. Users have high expectations for application uptime, and organizations are responsible for ensuring applications are available for them.

From application performance to user interface design, many factors can affect a customer’s experience—and resulting confidence—in your product’s capabilities. Therefore, your product team needs to be able to minimize the impact of failure. Once you know how much incident response impacts the user experience, you can appreciate the criticality of effective incident management and take the required steps to limit the impact of incidents.

How Incident Response Impacts User Experience

Incident response is about delivering optimal services despite disruption. If you have a network breach, a poor incident response might force you to shut down the server for an extended period, resulting in loss of time, productivity, and data. A better incident response might mean a shorter shutdown or no downtime at all.

Impact on User Experience Based on Incident Type

Different incident types demand other response plans, and there are various workflows you should follow to minimize impacts.

Let’s look at how various types of incidents impact user experience:

Network outages

Any network outage has a significant impact on user experience. An outage can cause a loss of productivity, poor customer service, and lost revenue.

A poorly managed network outage can negatively impact the company’s reputation, leading to a more widespread decline in customer trust. Moreover, network outages affect internal productivity as employees wait for incident teams to resolve the issue. Any substantial delays in resolving the outage can lead to a lack of employee trust, lowered morale, and consequent drops in productivity—even during uptime.

For example, on April 5, 2022, Atlassian’s server went down when an “internal communications gap” resulted in the improper removal of a standalone legacy application. According to Atlassian, while no customer lost more than five minutes’ worth of data, many of those affected had to wait two weeks for complete service restoration. Atlassian reacted to this incident with a detailed remediation plan: adding customers to an automatic restoration program, implementing universal “soft deletes” to prevent unintended service interruptions, and creating an incident communications playbook.

Incidents like these underscore the significance of integrating incident management tools and observability tools. With the proper combination, you can monitor and receive warnings regarding:

  • Latency spikes: Indicate network unresponsiveness or delays in commands because of network congestion or the presence of high bandwidth background processes
  • High utilization: Congestion caused by high traffic volume
  • Dropped packets: Indicate a device failure or congestion between the client and server
  • Unfamiliar activity: Indicates a potential breach or unplanned changes within the network

Infrastructure flaws

Errors in code or infrastructure can cause unexpected behaviors, break features, and degrade performance. That’s why you need a robust incident response plan, whether via automated systems or manual bug reports, to detect and repair bugs quickly.

According to the 2022 State of User Experience Report, 23% of users experience an error at least once a day that prevents them from completing a task. Moreover, users are growing increasingly impatient—one in five say they won’t wait for an issue resolution and 20% of users have stopped working with a brand altogether due to infrastructure problems.

For example, a misconfigured firewall can block critical functionality, preventing employees from completing their tasks. Alternatively, consider an e-commerce page with a poorly designed back-end database. This flaw might cause slow page load time, leading to abandoned shopping carts and lost revenue.

Efficiently built and managed infrastructure is a non-negotiable component of a successful web app—and creates a pleasant and rewarding user interaction.

How Incident Response Impacts User Experience

Imagine a massive outage on your network, and you have yet to learn what caused it. Your incident management application detects an unknown device on your network.

It’s serious—a hacker has compromised one of your servers through a vulnerability in your network. They’ve gained access to a server and blocked access to one of its critical components, causing outages across all applications that rely on this server: email, file-sharing services, customer support portals, and more.

The attacker launches a Distributed Denial of Service (DDoS) attack against one or more applications. Nobody within your organization or your customer base can access critical applications.

The outage exposes multiple vulnerabilities across the entire service delivery process. Customers cannot perform business-critical functions, employees are unable to access servers, credentials are compromised, and there’s a significant loss of customer data.

This incident will seriously impact your brand’s reputation and your customer’s ability to trust your product and company. You may also experience a substantial loss in revenue.

Use Service Intelligence to Minimize Poor User Experience

Unless you know what’s wrong, you can’t fix it, and root cause analysis is the most challenging aspect of the incident management process. It can take hours or days to gather data from multiple sources and filter and analyze it—time that could be better spent on remediation.

Critical incidents happen in seconds, so it’s essential to accelerate the incident management process with automation. Site Reliability Engineers already use automated tools to improve the incident management process, with that their teams leverage automation to help them evaluate performance and availability of alerts.

Service Intelligence is an xMatters feature that helps you reduce resolution times and speeds up root cause analysis. Service Intelligence consolidates data, allowing you to focus on resolution rather than data management. It automatically identifies the required data types for root cause analysis, compiles them on a dashboard, and starts an automated alerting system. Depending on the incident type, relevant stakeholders are informed so they can make quick decisions informed by the data. xMatters Service Intelligence also uses change intelligence telemetry to provide additional data on potential root causes. The sooner you can take control of an issue, the sooner you can get services back up and running, minimizing the impact on user experience.


It may seem they have little to do with each other, but user experience and incident response are closely intertwined. Suppose you fail to manage an incident in a reasonable time—it could impact your whole service delivery workflow and point to issues in the architecture of the server and application.

Implementing automated workflows is the best way to ensure you have the capacity to handle incidents effectively. Automated tools identify and trigger incident management workflows according to your configurations, allowing you to resolve issues quickly.

To try an incident management solution that can automate your workflows, request a demo of xMatters today.

Request a demo