Boost up your DevOps Maturity

Published in

ELMO Software

6 min readNov 17, 2022

DevOps Maturity is an approach that we use to progressively increase maturity across teams in areas of agility, reliability, and security.

In ELMO, the DevOps team functions can be classified across key pillars which include cloud engineering, developer’s experience, and SRE. The team supports more than 15 product teams/squads where each team owns 1 or more application groups or services.

The intent is to — get to a point where teams are self-sufficient, teams have the ability and the tools to innovate faster, ship features themselves within certain guardrails, and be able to support the services they’re accountable for.

At a high level, there are guiding questions that we ask such as

Are we managing and provisioning infrastructure through code as opposed to manually?
Do our teams have a way to merge code changes where automated builds and tests run to give us feedback?
Are we in a position to know the health of our services before our end users?
and others …

These guiding questions eventually define what the framework looks like, the tools we use that can answer these questions, and how successful teams are in adopting and leveraging these tools.

Why did we bother with this?

1. We can’t improve what we can’t measure

As cliche as it may sound — we can’t improve what we can’t measure.
It’s hard to lose weight if we don’t use the weighing scale now and then.
It’s hard to tell if our training routine is working if we cannot compare what it was before, and what it is now.
And so we use DevOps maturity to measure progress and see how far we’ve improved over time or are we improving things over time?

2. Allows us to prioritize effectively

We support several product teams utilizing several tools and so we don’t expect everything to be adopted and matured overnight.
So we use DevOps maturity to identify which areas we need to pay attention to first which for the most part we align with business priorities.
For example, last quarter, with our ISO 27001 recertification, we had to prioritize security activities and get our security posture in better shape.
The quarter before that — we were deploying new modules like Hybrid Work — focusing on agility and ensuring this new module has the foundations and best practices.
This quarter — we saw an increase in incidents — so we’re prioritizing monitoring and logging and making sure that teams have a way to measure their application performance and improve the availability of their services.

3. Gives us focus and direction

And with the right priorities defined at certain points in time, this gives us focus and direction.
We limit the WIP as we don’t have to worry about everything under the sun and accept that this is a marathon and we’re gradually moving the needle over time.

Where are we today?

It’s been over a year since the first iteration of DevOps maturity was introduced and we’ve evolved the approach to align with the business landscape and improved it based on the feedback along the way.

> 90% of our stacks follow IaC principles
IaC is foundational — Every time you get an implementation that is repeatable, reviewable, gives us some immutability, and allows us to spin things up and down with minimal work — as opposed to doing things manually and tedious — it doesn’t compare.
Operating efficiency unlocked. We make sure that any new deployment follows the same process, automated where we can and maintainable — throughout the lifecycle of the system.
Terraform is our default tool, though we leverage CloudFormation in cases where it makes more sense.
> 90% of our stacks follow CI practices
Similar to IaC, more than 90% of our stacks have CI implementation and we leverage Jenkins heavily for this function.
7 out of 9 ELMO key applications/stacks are now migrated to Kubernetes (our default container orchestration platform)
The remaining 2 are now in progress and when completed will account for 80% of our workloads running in Kubernetes (the rest are EC2-based workloads).
ELK APM
We’re aiming for 100% adoption because this is critical to the quality and performance of our applications. We used to have NewRelic and Splunk and consolidated to ELK as part of simplification to reduce cognitive load.
>90% Trivy Coverage
For our containerized workloads, we leverage Trivy as part of our CI to scan our images for security vulnerabilities. This has been an essential tool in improving our security posture as we have over 90% coverage across our apps since implemented. We’ve even extended these to our “maintenance-mode” apps that may not have regular build runs by having regular jobs scanning production images on a weekly basis.
>90% SonarQube coverage
Similar to Trivy and as part of our CI, >90% of our stacks are now continuously inspected (code analysis) via SonarQube for code smells and overall code health.

How may this approach apply to you?

This approach can be useful if

You have too many tools in your toolbox and are unable to leverage them effectively
You have 3 different tools doing the same thing. For example, you may have NewRelic, Splunk, or ELK used for monitoring
You want to increase the team’s operating efficiency and help reduce the cognitive load
You want a way to track progress over time eg quarterly
And most importantly, you want your teams to be in the best position possible to provide value to your customers

Sample templates

If this approach seems useful and you need a reference to start with, can use the steps below.

1 — Surface guiding questions on what you and your teams care about
2 — Identify potential tooling that can answer these questions
3 — Baseline your critical apps/services on whether they’re enabled to leverage these tools
4 — Calculate rating based on overall coverage

This provides awareness of where teams/services are at and when the data is used as a conversational tool in aligning areas where you spend and prioritize efforts — this becomes very valuable.

After surfacing the baselines, you can then use the ratings to guide conversations on how you can progressively increase maturity over time.

1 — Provide a high-level view of critical applications/services you care about
2 — Regularly track progress to see if you’re moving the needle over time

In summary

We’ve come a long way since DevOps Maturity was first introduced. We’ve evolved the approach over time to align with the business landscape and this alignment proved to be very valuable.

In consistently raising awareness and using this as a part of quarterly goals, it ensured we have the bandwidth and capacity to progressively increase maturity across our services.

Most importantly, the success of this approach won’t be possible without the individuals’ and teams’ collective effort and continuous improvement mindset in getting our services and tech stacks up to high standards.

If you find this post useful and have thoughts on areas it can be improved, would love to hear from you :)