Photo by Yousef Al Nasser on Unsplash

Why We Don’t Need a DevOps Team

Andrew Hatch
SEEK blog

--

Recently we retired the word DevOps, both as a team name and a job title at SEEK. We realised that having a DevOps team made no sense to us anymore.

In this post I’ll discuss the reasons why we renamed the team to Platform Engineering; the process we followed to make this decision; and how our team will continue to take part in our evolving technology landscape.

Why did we even have a DevOps Team?

Many years ago we recognised the value of bringing Operations into the Delivery cycle to gain the benefits of developing continuous integration pipelines and automated environment configuration. All focused on speeding up delivery and hopefully, avoid the crippling issues we faced with existing Production Deployments. DevOps seemed like a good name for a team at the time, it spoke to the ideals that we wanted and the profile and importance of what it stood for.

Up until 2015 the DevOps team at SEEK assumed full responsibility for all builds, deployments and operational support for our Hirer and Candidate sites.

It was not an easy job.

We had made rapid advances in our delivery processes but we still faced torrid nights on-call with software systems straining under the sheer volume of product being deployed to them. As we shifted more workloads into the cloud and created autonomous teams that did their own support, this burden lifted and we began to focus on other valuable tasks. Many centred on improving the platforms we run our software on; adopting a software engineering focus to how we did our work; and vastly improving the automation and durability of the systems we supported.

DevOps 101

The term is not new, but with the passage of time its definition has grown significantly over the last 10 years. Numerous books have been written on it. Conferences around the world base themselves on it. Vendors use it to describe their tech products. And recruiters regularly add it to job descriptions to attract any candidate ranging from Systems Administrators to “full stack” Developers.

In simple terms — it means a lot of things. But at its core DevOps is about adopting a delivery culture and breaking down traditional IT silos (such as Development and Operations) to blend skills, work practices and processes.

The ultimate goals are:

  • growing people skills and capabilities
  • promoting small, autonomous teams that own what they build from end-to-end
  • delivering product to market faster
  • being more adaptable and flexible to change,
  • and producing outcomes of higher quality.

In other words… DevOps is not about a specific team of individuals, a job title, or a product. DevOps is all about your culture. And your culture is unique — remember that

Adopting a DevOps culture is valuable

Here is what we observed as skills and capabilities flowed between Development and Operations.

Operations into Development

Development teams now support what they build, define their own operational metrics and handle their own incident management activities. The development of continuous delivery pipelines and the ability to deploy in real-time during the day has been a huge win in this area. Also it was the development teams that drove the adoption of tools like PagerDuty — not Operations — as they looked to use tools that made more sense to them.

Development into Operations

Implementing software engineering practices into our Platform Engineering team over the last two years has improved our systems automation, coded our infrastructure and improved legacy system reliability. Importantly, the distinction between what was once two very separate role entities (Development and Operations) has evaporated in many of the roles the team now performs.

Or to put it another way, 4 years ago we deployed to production less than 5 times a month, now we average 800 times. The approach has clearly delivered a lot of value

The change process

During the last 12 months, we’ve been repeatedly asking ourselves this question — “If all of our development teams adopt and follow the practices championed by what is generally considered “DevOps”, where does that leave a team called DevOps in name? What is the actual value it is providing? And do we even need one at all?”

Changing teams, teams with culture, history and purpose is complex, time-consuming and important to get right. It requires understanding the value that your team already provides, the improvements your stakeholders want, and how to continue to provide value to your organisation

Most importantly, transparency should be employed throughout the entire process so everyone is acutely aware of what is happening.

No-one likes surprises.

It starts with analysis

We analysed what we needed to do:

  1. Understand exactly what our team does now i.e. how do we see ourselves?
  2. Interview a broad range of stakeholders to determine the value we provide to them and where could we improve i.e. how do others see us?
  3. Draw conclusions and use this as input for a strategy to develop objectives, guiding principles and actions to implement, measure and continuously refine i.e. know what we are and what we want to be

Understanding what we do

  • Provide on-demand — and in a few areas — dedicated support to product- focused, software delivery teams
  • Provide on-demand support for operational support and shared service teams such as Security, Infrastructure and Network/Systems engineering
  • Provide incident response and first level support for systems still in our Data Centre and AWS
  • Build systems and services that can be used by all delivery teams in AWS. e.g. routing solutions, new account provisioning, cost management and infrastructure API’s.
  • Continuous improvement and automation of existing infrastructure and the support/enhancement of non-production systems such as build and deployment tooling

But knowing how others see our value was an interesting exercise

Identifying stakeholders

For our team, these were many. Ranging from developers, architects, delivery managers, senior managers, finance teams, infrastructure, and security teams. In total we invited 20 people to participate in mixed round-table discussions.

There were two broad topics discussed at each meeting:

  • how reliant are you on the team now?
  • in your own view, how self-sufficient are you in regards to operational stability and reliability practices for the products and services you build and support?

To keep the conversation flowing and avoid group-think, we made sure we never had more than a few people in each meeting and that each person was from a different part of the business.

What we learned

We gathered lots of data and grouped it into three broad categories from all the respondents. This formed the Guiding Principles for our strategy moving forward:

Domain — The team has significant domain knowledge. We’ve gained lots of experience providing support to a huge number of stakeholders.

Technology — Over time we’ve obtained plenty of specialist data centre and AWS experience. We’ve evolved significantly, along with the rest of our Technology Department during the last few years. Adopting software engineering principles has been a large part of it.

Operations — At our core we are still a hardened operations team with decades of experience working on small and enterprise scale systems. This experience is valued by SEEK, and we’ll continue to assume on-call responsibility for a number of business critical systems.

And the conclusion?

We still have a vital role to play.

This was reassuring.

Roles that focused on how our legacy systems hang together along with the delicate balance required supporting services like DNS and networking were rated critical. We also discovered:

  • Some teams want a visible and permanent presence from our people to help clear their technical debt and speed up delivery
  • Others would like more proactive and on-demand access as and when they need it
  • Operations teams that provide supporting and shared services still need our help, in fact we’re critical to them achieving certain objectives
  • System ownership and who should support what, is a broader problem for us, but maybe we can help solve it or at least provide better clarity on what part we have to play in this.

Moving forward

We defined our objectives. The final step was deciding on our first step. Plus we needed to know what success would look like.

Objectives

We started with 4 high level objectives:

  • Provide greater clarity to stakeholders to assist with prioritisation of work tasks
  • Continuously improve site reliability practices — any change we make should focus on how we can help with that process
  • Arrange on-going care and attention for our Legacy systems. They’re not as fun to deal with due to their outdated technology stacks, but they are critical to our business and need to be treated as first-class citizens
  • Establish ownership of shared tooling and systems for technology teams

Actions

We came up with two high-level actions to start with. And we’ll measure and assess both in 3 months time:

  1. Rename our DevOps Team to Platform Engineering

If there is one thing that came through clearly in our analysis, it was that calling a team “DevOps” does nothing to explain what it is the team actually does. Changing the name to Platform Engineering is clearer as it combines what we focus on (our hosted Data Centre and AWS i.e. our Platforms) and what we do (build systems and services to continuously help and improve our ability to deliver, support and maintain our products).

2. Be clear about the areas Platform Engineering focuses on and how this impacts our stakeholders

We came up with two: Operations and Site Reliability

Operations is all about which systems we support under our remit today. We will continuously improve, support, maintain or decommission them moving forward, and we may take ownership of them in the future.

Site Reliability is about we can help teams build new systems and services to get them through roadblocks or hurdles. We want to help them take as much ownership of their solutions as possible. This area is designed around shorter-term engagements and projects that impact most or all technology teams.

Did we get it right?

Time will tell, but with the new structure in place we are setup to evolve in multiple directions to support growth and demands over time. We’ll continually assess and refine our engagement model as we go along.

A key takeaway from this process was that our learnings were continually surfaced and communicated with the team. Ultimately it simply became a reflection of our current state, made logical sense, and thus lessened the impact of change.

In future posts we will look at how our focus areas have developed and evolved, more on this later …

--

--

Andrew Hatch
SEEK blog

Father, Santa Cruz Surfer, fiddler of old Datsuns. Engineering resilience as best I can