Uptime Blog


Embrace United DevOps Culture, Break Down the Great Wall of IT

Stephen Walters ON May 12, 2020

When I think of the hopes and dreams that a united DevOps culture and site reliability engineering were intended to bring us — dreams of speed to market and improved production stability — it saddens me a little that some think it’s all about automation. This is only one element of what is in large part an entire cultural shift in the way we do things – or at least it should be.

In this article, I will identify three cultural challenges that should be faced before implementing any automation: roles, organization and failure.

Roles, organization, and failure

Roles, organization, and failure

A long, long time ago

For decades, the method for the software delivery lifecycle has operated as an incomplete version of the four-step continuous improvement Deming Cycle, which is a simple cycle of Plan – Do – Check – Act. In practice, much of the software development lifecycle has been a one-shot plan, a much-repeated Do-Check cycle, followed by single one-shot, big-bang act. Typically, the planning followed by cycles of Do-Check have occurred in the dev world, and when dev is ready, it has thrown the lot over the “Great Wall of IT” for ops to “Act.” We dreamed of a better, united world.

And then came agile

The Deming Cycle has been more effectively implemented for Agile, mostly in the dev domain. Developers have reduced change risk to production systems through smaller incremental changes, but that Great Wall of IT still exists. Often, I’ve seen development squads run iterative Deming Cycles and produce releasable content at the end of a series of sprints. Those many releases are accumulated into a single one-shot, big-bang Act for operations to support. Sound familiar?

The Deming Cycle

The Deming Cycle

DevOps: the great wall buster

DevOps is supposed to be the “Great Wall Buster.” Its intent was to crash through that wall to ensure much faster delivery. While speed has increased, there’s also been a problem: DevOps transformations have not so much broken down the wall, they’ve created a one-way fast track gateway and the push to change fast has sadly led in some cases to entrenched cultural resistance.

Divided we fall

DevOps automated toolchains have done a fantastic job of fast-tracking those risk-reduced iterations, but the speed, communication and collaboration paths are still one-way to delivery. Those paths are still ‘pushing’ the Act requirement onto operations, instead of developing a united DevOps culture. Not surprisingly, much of the early sponsorship for DevOps has come from development teams already reaping the benefits of Agile transformation. Operations resources have then been delegated the task and responsibility of acting.

The response from operations to meet the demand and cadence of change in DevOps, while ensuring production reliability, is seen through SRE. As with DevOps, the pillars on which this discipline depends can be easily correlated to those in a united DevOps culture. However, operations typically motivates and drives implementations.

The Great Wall

The Great Wall

The result is something all too familiar: development pushes a DevOps agenda (continuous rapid change), operations pushes an SRE agenda (stability and maintenance), and despite both having the same founding principles, that Great Wall of IT is still standing.

The Great Wall of IT is not a technical barrier, a business barrier, or even a process barrier – it’s a cultural barrier and possibly the hardest one for us to break down, but fundamentally, the most important.

United we conquer

  1. Remove siloed job roles
    The first challenge is that organizations must remove the concept of siloed job roles, while also ensuring that people retain their identity through skills and specializations. Let’s replace ‘not my job’ pre-conceptions with ‘we can do anything’. We must simultaneously respect that each individual has their own role to play, their own unique skills and strengths they bring to an organization. Whether development or operations, we are all engineers aligned around a common purpose.
  2. One team
    This should be the basic premise for the second challenge, where we build one collaborative team and not two teams, labelled development and operations, divided by a cultural barrier. In this team we can ensure that operational support is considered and baked in from day one, and that everybody owns the responsibility for ensuring quality development. Collaboration is not just about working together, but also working together towards a common goal. It requires respect, compassion and professionalism, doing what is right for all, not just one silo or individual.
  3. The acceptance of failure
    What remains once we have conquered our cultural adversity? It’s the acceptance of failure as part of our software delivery lifecycle and the inclusion of feedback into our flow automation. Failures will typically start out as an issue which then will be typically labelled as either a “defect” or an “incident”. With the acceptance of failure comes learning, development and growth. It enables higher quality systems and enhances our knowledge. This final challenge is one that deserves more time and detail and I will cover in a later blog.

This means that for us to realize the true potential of any digital transformation using DevOps/SRE, traditional development and operations teams must start thinking, acting and working in a truly collaborative manner – together AND in the same direction. It is not enough for operations to “shift-left” in the lifecycle, development must also “shift-right,” until there is no development or operations, only engineers working as one team.

Take our advanced features or get xMatters free forever
Create a free xMatters account for up to 10 users and use it for as long as you want, and upgrade anytime. Or sign up for a 14-day trial of our advanced features!

Try xMatters Today