DevOps is ubiquitous in engineering teams worldwide, helping teams work together in the cloud and improve over time. In part one of his blog series, xMatters Director of Cloud Operations Adam Serediuk discusses the much-disputed DevOps origin story and the evolution of DevOps and site reliability engineering.
Most of us did not star t our careers in organizations where a ‘same team’ attitude was the day-to-day reality. However, I’ve been lucky to work on some amazing teams that practiced DevOps before there was a name for it. We somehow intuitively understood that we were all in it together, and strove to make each other’s work, lives, and our product better. We knew that we each had roles to fill and products to deliver, and that software development and its delivery is a team sport, not a blame game.
But you don’t have to rely on intuition or some mystical alchemy of team dynamics — you can actively foster DevOps culture and a service ownership mindset.
We don’t really give the term “DevOps” a second thought these days — both the name and the practice seem obvious enough. But only a decade ago, that wasn’t the case at all.
A little DevOps history
One of the more accepted DevOps origin stories has it that at the 2009 O’Reilly Velocity Conference, Flickr’s John Allspaw and Paul Hammond began talking about integrating development and operations into a single team. An engineer from Belgium named Patrick Dubois couldn’t make the conference and decided to organize a meetup to discuss this novel hybrid team idea. That meetup needed a name, and Patrick came up with a name that stuck: DevOps.
But not so fast! I was able to find a reference to DevOps on Twitter dating back to Nov 2, 2007 from YouTuber Stephanie Liu:
My preferred DevOps origin narrative is a meeting at San Francisco’s iconic Thirsty Bear Brewery, the birthplace of many great (and not-so-great) ideas, tweeted by yet another YouTuber, Chris Zacharias:
Whichever DevOps origin story you believe, adoption of the DevOps name gave credibility and heft to a cultural change seeking to improve how software practitioners work together. The fundamental character of DevOps culture provided a sounding board for discussion, ideas, and practices for how we can work together more effectively, and with more happiness. This ethos built on the lessons learned from others to help inspire empathy, to give credence to see things from another’s perspective, and to truly understand that Development and Operations are in this together.
Into the cloud
The term DevOps gave a community license to discuss improving the way it works, and the move to the cloud has accelerated the practice. With servers and other resources only an API call away, the ability to apply the principles of DevOps has become easier. There’s less finger-pointing (at least internally) and there’s no excuse not to rebuild something that’s not working for you—it’s not like you have to wait months for new hardware to arrive.
This has made service ownership even better. With cloud providers being a self-service platform, service teams who build, deploy, and run their service need only to rely on themselves to meet their goals. On a common platform, multiple teams can be more collaborative by using similar tools, resources, and patterns.
I’ve found these common patterns to be crucial for enabling effective service ownership and delivery of software in the cloud. There’s no shortage of tooling in every category imaginable which has made for a very broad spectrum of both generic and service-specific implementation. Having a common pattern and approach allows teams to speak the same language and facilitates portability between these teams.
It’s important that common approaches don’t eliminate accountability for the implementation; rather, they serve to reduce duplicated effort and to allow those approaches to improve. After all, service ownership doesn’t mean doing it alone, it just means you have the responsibility. Teams or individuals who are experts in certain fields or subjects should be involved to provide guidance in their respective areas of expertise.
A brief monitoring example
Let’s look at an example. Monitoring is a hot topic in service ownership, and one of the major differentiators in both DevOps and service ownership is that developers are now on call, either for their service, or for more generic on-call purposes. Traditional monitoring has a lot of, well… noise. Like CPU, disk, and memory alerts that are generally not actionable and only serve to disrupt your sleep.
But what teams and on-callers really want to know is whether the service is up and meeting its objectives. People don’t need to be the arbiters of those states, but it’s difficult for many practitioners to do it differently, either due to limited tooling, lack of experience, or simply not knowing of a better way.
A new approach, led by subject matter experts, can provide better monitoring overall: Site Reliability Engineering (SRE) introduced the idea of Service Level Objectives and Service Level Indicators. Service Level Indicators such as Availability, Error Rate, and Request Latency are measured against their objectives. By using these indicators and objectives, you can craft your monitoring to alert only when metrics are outside of their targets.
Once you have established these targets, you can begin to implement a system to measure and report on them. You’ll likely find that there are commonalities between services and how they’re measured that will lead to repeatable patterns. A monitoring system that readily emphasizes ranges of values rather than exit status checking will help to accelerate this change.
At xMatters we use Prometheus for our monitoring and teams can clone Prometheus deployments on a per-service basis, allowing teams to easily leverage a standard monitoring platform. Most applications are written in the same language, use the same load balancer/API gateway stacks, and can report on the same metrics. This allows for ready re-use of these common patterns, and when we learn something new everyone can benefit from it–and we don’t have to worry about noisy CPU and disk alerts that don’t provide meaningful value at 3am.
I’m not going to pretend to be an authority on service ownership and DevOps. Truthfully, nobody is and that’s one thing that makes this great. By learning together and sharing our insights we are all iterating and improving together. Technology and the way we build it has changed dramatically in the past decade and everyone is trying to make the best of it.
In my next blog, I will discuss how to promote a service ownership culture and the benefits of team autonomy. Until then, drop a comment below or meet me at one of my meetups.