Discover why PagerDuty users are switching to xMatters. Listen to insights from Ben Narramore, Director of Global Operations at PlayStation.Watch webinar

Uptime Blog

What Your System Outage Notifications Need To Say

System outages happen to the best of us. Communicating with your customers and other stakeholders effectively during downtimes is vital to maintaining a solid relationship with them.

When a system outage occurs, technical teams are tasked with swiftly locating the cause and resolving the issue, while communications teams are tasked with notifying stakeholders and customers about the outage to maintain transparency. These communications need to carry relevant and actionable data, be concise, and ensure that they don’t point fingers or deflect blame. Because of how important these communications are, it’s critical that you plan your strategy for system outage notifications ahead of time.

Creating Effective System Outage Notifications

Although there’s no one-size-fits-all method to create system outage notifications, the following practical tips can be implemented while framing your system outage alerting strategy.

Keep Notifications Concise

One of the most significant needs during outages is to be quick with concise messages. Your outage notifications should get straight to the point. Detailed guidelines and templates will help you condense your messages down to the essentials and accelerate communication. Prepare these ahead of time to be ready when an incident occurs.

Communicate the Severity

Outage communication needs to include details about its impact and severity. Most IT teams maintain a set of incident severity levels, enabling them to prioritize issues and respond accordingly. Make sure to include the incident severity level in the notifications that you’re sending to internal stakeholders so that the response protocol is clear from the first step. For example, an SMS message could start by stating “A Sev 1 incident has occurred” so readers only need to skim the first few words to know a response is necessary.

Enrich with Other Useful Information

While the incident’s severity conveys the type of response needed, consider adding other relevant data to provide your response teams with a head start in the following notifications. Adding contextual information like stack traces, logs, and routine checklists helps paint a clearer picture of the situation to find the fix faster. The right information ensures that incidents are easy to understand and can be intelligently correlated.

For example, an initial notification is likely to be short and concise, including the severity level and response options. Once a resolver has confirmed they’ll be helping to resolve the issue, follow-up notifications may include which traces and logs they should be looking into, or where they should start their search for details in impacted tools.

Use Familiar Language

Keeping messages concise is essential, but this doesn’t mean you should resort to using jargon and abbreviations that might be difficult for everyone to understand. It is critical that system outage notifications are without any potential for miscommunication. One of the best ways is to write the notification messages in the same way that colleagues speak to one another; avoid unnecessarily complex words and uncommon acronyms, and rely instead on common terms and phrases teams use every day.

Tailor it to the Recipient

Your information should focus on the people receiving your messages. For example, the notifications for SREs and IT managers should dive into the technical specifics, including where the issue is located, the time and place it was discovered, and where they can find logs and reports. In contrast, executive stakeholders are going to be more so concerned with how many users and customers are impacted, and an estimated time to resolution.

Having a custom-tailored template ready for each stakeholder segment is vital to having everybody on the same page. Using an automated solution also helps ensure each group gets the information they need while your team focuses on delving into the incident.

Be Consistent

Keeping system outage notifications consistent enables your incident response teams to act quickly without spending precious time understanding the situation. This allows responders to dive right into resolving the incident—keeping downtime to a minimum. Having varied messaging formats and language can lead to confusion for those trying to remediate the issue and those who are impacted.

Conclusion

An effective outage notification strategy is key to having a sound and healthy incident resolution process. It’s essential to keep your team on the same page and let your business stakeholders know what’s happening. To communicate effectively, you need to be concise while providing exact information to all recipients.

A service reliability platform like xMatters can be of great use when tackling issues, from automatically sending out clear system outage notifications to assembling your teams to resolve the incidents. Check out xMatters today to learn how we can help organize your on-call resources before, during, and after incidents.

Request a demo