Discover why PagerDuty users are switching to xMatters. Listen to insights from Ben Narramore, Director of Global Operations at PlayStation.Watch webinar

Uptime Blog

Sailing Toward Continuous Deployment on Google Cloud with Spinnaker

Sailing Toward Continuous Deployment on Google Cloud with Spinnaker

One motivation for moving to Google Cloud Platform was using continuous deployment to meet demand spikes. Introducing Spinnaker. We looked for a product which could provide us key out-of-the-box functionality, and Spinnaker checked most of the boxes. Here’s what happened.

For a year at xMatters, we worked on a large (ok, massive) migration to Google Cloud Platform (GCP). One of the motivations for moving to GCP was having the capability to easily and instantly scale up xMatters On-Demand capacity with continuous deployment to meet demand spikes.

That meant that our tooling needed to support our ability to scale up quickly and easily for anyone with appropriate access. We only had a year, or we would have to renew a three-year lease on six expensive data centers, so time was definitely not on our side. We didn’t have the luxury of building an in-house tool which could accommodate multi-region deployments and single-click scaling, all protected behind access controls. Instead we looked for a product which could provide us key out-of-the-box functionality:

And naturally we wanted that product to be in active development. With our wishlist in hand, we carried out a research spike to evaluate several solutions available at the time:

After some further investigation, Spinnaker clearly checked off our most important requirements, so we decided to move ahead with it.

Spinnaker introduced quirky UI bugs, sluggish performance, and management woes.

Spinnaker introduced quirky UI bugs, sluggish performance, and management woes.

The Sea Was Angry
Up until this point only a handful of our developers and operations members had used Spinnaker, so they created various pipelines to ensure that the basic building blocks were in place. So far, so good.

But when we opened Spinnaker up to the rest of our development crew, we started to feel some pain — quirky UI bugs, sluggish performance, and management woes. In hindsight, this wasn’t too surprising because when we first deployed Spinnaker there were few guides to follow. So, we had to try to decipher multiple levels of over-engineered config files, trace through verbose logs, and restart unhappy services.

Here are a few items that will give you the flavor of what we ran into…

UI/UX Problems

    • It didn’t take long to realize the Spinnaker UI is quite… well, fragile. Compared to Jenkins where you can do almost anything, Spinnaker really dials back the amount of flexibility you have. We spend a lot of time painstakingly editing applications to avoid inadvertently breaking something else.
    • JavaScript is a blessing and a curse — due to aggressive caching strategies, what you might see in the UI isn’t necessarily the same data your colleagues see. This meant asking people to constantly clear their cache, especially after upgrading.

Performance Issues

    • Out of the box, all microservices communicate to the same Redis instance, which does not scale at all:
      • Orca needs its own Redis
      • Cloud Driver also needs its own Redis
    • Once you reach a certain scale you will need to start introducing read-only Redis nodes, which might not sound like much, but when you are already managing 10+ microservices it can become daunting

Well, Spinnaker wasn’t even 1.0 at the time so perhaps our expectations weren’t realistic. And even with all its quirks, we successfully deployed our tech stack to multiple regions, so at least we had that going for us.

It became more challenging to maintain Spinnaker.

It became more challenging to maintain Spinnaker.

Changing Tack
As time went on, it became more and more challenging to maintain Spinnaker. Our initial Spinnaker deployment strategy was a hot mess of manifest files, bash scripts, and other snowflake configurations. In our defense, it was the only way to deploy Spinnaker at the time, and many other organizations were facing similar frustrations. That forced the Spinnaker team to rethink things, which led to halyard, a tool for configuring, installing, and updating Spinnaker.

While that was good news, it created an obvious challenge for us: how do we migrate from our old setup to the new one? Naturally there were no official migration guides, so it was up to us to piece it all together.

We started by spinning up a new ‘production’ Spinnaker alongside our original deployment and began to methodically migrate changes over. This essentially meant taking all of our manifest files, and through trial and error running the equivalent halyard command. The whole process took a few weeks (!), but after the migration we could reliably upgrade Spinnaker in a repeatable manner. Praise be to the sea gods.

While we were at it, we also fixed some of the performance bottlenecks, especially those related to Redis. We deployed standalone Redis instances with persistent storage for Cloud Driver and Orca. We scaled out the number of Cloud Driver and Orca microservices to 6 and 4, respectively.

We also made some tweaks to the number of CacheThreads which is highly beneficial when your Kubernetes clusters have lots of namespaces. These changes resulted in drastically improved overall performance.

Kubernetes Provider V2 deploys native Kubernetes manifest files.

Kubernetes Provider V2 deploys native Kubernetes manifest files.

Red Sky at Night, Sailors’ Delight
Around the time we migrated to halyard, Spinnaker also released its new deployment provider for Kubernetes, the cleverly-named Kubernetes Provider V2. In a nutshell, it deploys native Kubernetes manifest files which means you don’t have to spend all your time changing application settings. Instead, you write a native Kubernetes manifest file and Spinnaker will ingest and deploy it for you. This is especially great because it means you can keep your manifests in source control.

Unfortunately, there was no easy way to migrate from the V1 to V2 provider. As Spinnaker continues to grow, my hope is that it will start to release tooling that makes these kinds of transitions easier to navigate.

New Destinations
In recent months there have been improvements to the Spinnaker CLI tool which allows you to interact with the Spinnaker API. We have plans on using open-source tools like k8s-pipeliner with Spinnaker CLI to fully automate the creation, deletion, and updates of pipelines. This will give us a completely source-controlled deployment process.

Our journey to continuous deployment has not been easy but we’ve been making great strides in the right direction. And, despite all its challenges, I believe we wouldn’t be as close as we are today if it wasn’t for Spinnaker.

Have you used Spinnaker or another tool to try to achieve continuous delivery? Let us know on our social channels. To try xMatters for yourself, race to xMatters Free, and you can use it free forever.

Request a demo