Uptime Blog

Use Google Cloud Run to Roll Back to a Previous Version

Use GCP to Roll Back

Serverless options like Cloud Functions and Cloud Run provide excellent platforms for getting a cloud service up and running with minimal hassle. These Google Cloud Platform (GCP) products allow developers to focus on solving the business problem at hand, rather than getting bogged down in the details of operating systems and servers. But with any digital service, there can be issues that have detrimental effects on the users or applications consuming those services which can affect the bottom line… or at least your boss’ temperament. Fortunately, with the tools provided by Google and xMatters, getting your service (and your boss) back in good shape can be as easy as the touch of a button. In this article, we’ll dive into a microservice running on Google Cloud Run and how we can roll back to a previous revision after Operations Suite triggers an alert through xMatters.

After the most recent deployment, an image is missing, which leads to an uptick in errors. Users visiting our Emu Simulator page are missing out on our fun mascot!

The mascot in the Emu Simulator

The mascot in the Emu Simulator

Fortunately, the Cloud Operations suite is collecting the errors and triggers an alert to xMatters based on the Alert Policy settings.

The Cloud Operations suite triggers an alert to xMatters.

The Cloud Operations suite triggers an alert to xMatters

 

xMatters Flow Designer also allows for enrichment of notifications

Enrich messages with Flow Designer

Using the current on-call schedule for the ops team responsible for maintaining our favorite Emu Simulator, xMatters determines that I’m on-call and delivers the notification across my assorted devices. Here’s the mobile app showing me the issue details captured by Cloud Monitoring.

xMatters Flow Designer also allows for enrichment of notifications, by retrieving relevant details of the service before notifications are sent out. I don’t have them configured here, but if I did, I could retrieve the last commit and build information, or even some service mesh details, to have greater context for making appropriate decisions about what to do next.

After reviewing the information, I open the response drawer to take action.

Here, I see the standard on-call lifecycle options Acknowledge and Escalate, which will either stop the device and group escalation or pass it off to the next person in the schedule, respectively.

The last response option is where alerting meets incident response and is a powerful way to trigger a workflow to get the service up and running. I’ve reviewed the necessary details and decided we should attempt a rollback – after all, our users won’t be happy if they can’t run their emu simulations!

The standard on-call lifecycle options Acknowledge and Escalate,

Standard on-call options

While that’s chugging away, let’s check out how this is configured in Flow Designer. Below, I see the Flow Designer canvas with two flows, one for inbound into xMatters and the other for outbound from xMatters, with the palette of apps to the right of the canvas.

xMatters Flow Designer with an inbound flow and an outbound flow

xMatters Flow Designer with an inbound flow and an outbound flow

The inbound flow consists of the HTTP trigger responsible for parsing the incoming request to populate the values that get passed downstream. The next step posts a message to the #monitoring-alerts channel to give a broad audience awareness that an issue is occurring. Microsoft Teams is another option that Flow Designer supports, for organizations leaning that route, and we’re also investigating adding Google Hangouts to the mix. The final step is to create an event to target the on-call resource and get the notifications delivered.

Use Flow Designer to create an event to target the on-call resource and deliver the notifications

Create an event to target the on-call resource and deliver the notifications

After the on-call resource receives the notification, they can select one of the response options to take action. As mentioned, Acknowledge and Escalate deal with the on-call lifecycle, but they also post into the #monitoring-alerts channel to inform the team someone has accepted (or escalated) the issue.

Choosing Rollback will create a ticket in Jira, and then trigger the Cloud Function to initiate the rollback in Cloud Run.

Create a ticket in Jira, and trigger the rollback in Google Cloud Run

Create a ticket in Jira, and trigger the rollback in Google Cloud Run

Checking out the Cloud Run side, we see the traffic has been routed to the previous revision:

In Google Cloud Run, the traffic has been routed to the previous revision

In Google Cloud Run, the traffic has been routed to the previous revision

And our beloved mascot is pretty happy about it too:

Thanks to Google Cloud Run, there's our mascot!

There’s our mascot!

This means our users can go back to simulating emus, and we can get the issue resolved permanently and redeploy when ready.

If your microservice is running as a Google Cloud Function, Google Cloud Run, or full-blown Kubernetes Engine, there can and likely will be issues that impact your users. Having an incident process in place can help reduce the cost and impact of these events, and going one step further to having an incident response process can reduce the impact even further. So, come check out the xMatters Flow Designer, and rest assured you can get your services back into a happy place!

xMatters is free to try for as long as your want for teams up to 10 people. You get unlimited integrations like Cloud Run that you can use with Flow Designer for all your incident response and management needs.

Try xMatters Free