The Fifth DORA Metric: Reliability

Published in

The Typo Diaries

4 min readJul 5, 2024

This blog was originally published in the Typo blog.

The DORA (DevOps Research and Assessment) metrics have emerged as a north star for assessing software delivery performance. The fifth metric, Reliability is often overlooked as it was added after the original announcement of the DORA research team.

In this blog, let’s explore Reliability and its importance for software development teams.

What are DORA Metrics?

DevOps Research and Assessment (DORA) metrics are a compass for engineering teams striving to optimize their development and operations processes.

In 2015, The DORA (DevOps Research and Assessment) team was founded by Gene Kim, Jez Humble, and Dr. Nicole Forsgren to evaluate and improve software development practices. The aim is to enhance the understanding of how development teams can deliver software faster, more reliably, and of higher quality.

Four key metrics are:

Deployment Frequency: Deployment frequency measures the rate of change in software development and highlights potential bottlenecks. It is a key indicator of agility and efficiency. Regular deployments signify a streamlined pipeline, allowing teams to deliver features and updates faster.
Lead Time for Changes: Lead Time for Changes measures the time it takes for code changes to move from inception to deployment. It tracks the speed and efficiency of software delivery and offers valuable insights into the effectiveness of development processes, deployment pipelines, and release strategies.
Change Failure Rate: Change failure rate measures the frequency at which newly deployed changes lead to failures, glitches, or unexpected outcomes in the IT environment. It reflects the reliability and efficiency and is related to team capacity, code complexity, and process efficiency, impacting speed and quality.
Mean Time to Recover: Mean Time to Recover measures the average duration taken by a system or application to recover from a failure or incident. It concentrates on determining the efficiency and effectiveness of an organization’s incident response and resolution procedures.

What is the Reliability Metric?

Reliability is a fifth metric that was added by the DORA team in 2021. It is based upon how well your user’s expectations are met, such as availability and performance, and measures modern operational practices. It doesn’t have standard quantifiable targets for performance levels rather it depends upon service level indicators or service level objectives.

While the first four DORA metrics (Deployment Frequency, Lead Time for Changes, Change Failure Rate, and Mean Time to Recover) target speed and efficiency, reliability focuses on system health, production readiness, and stability for delivering software products.

Reliability comprises various metrics used to assess operational performance including availability, latency, performance, and scalability that measure user-facing behavior, software SLAs, performance targets, and error budgets. It has a substantial impact on customer retention and success.

Indicators to Follow When Measuring Reliability

A few indicators include:

Availability: How long the software was available without incurring any downtime.
Error Rates: Number of times software fails or produces incorrect results in a given period.
Mean Time Between Failures (MTBF): The average time passes between software breakdowns or failures.
Mean Time to Recover (MTTR): The average time it takes for the software to recover from a failure.

These metrics provide a holistic view of software reliability by measuring different aspects such as failure frequency, downtime, and the ability to quickly restore service. Tracking these few indicators can help identify reliability issues, meet service level agreements, and enhance the software’s overall quality and stability.

Impact of Reliability on Overall DevOps Performance

The fifth DevOps metric, Reliability, significantly impacts overall performance. Here are a few ways:

Enhances Customer Experience

Tracking reliability metrics like uptime, error rates, and mean time to recovery allows DevOps teams to proactively identify and address issues. Therefore, ensuring a positive customer experience and meeting their expectations.

Increases Operational Efficiency

Automating monitoring, incident response, and recovery processes helps DevOps teams focus more on innovation and delivering new features rather than firefighting. This boosts overall operational efficiency.

Better Team Collaboration

Reliability metrics promote a culture of continuous learning and improvement. This breaks down silos between development and operations, fostering better collaboration across the entire DevOps organization.

Reduces Costs

Reliable systems experience fewer failures and less downtime, translating to lower costs for incident response, lost productivity, and customer churn. Investing in reliability metrics pays off through overall cost savings.

Fosters Continuous Improvement

Reliability metrics offer valuable insights into system performance and bottlenecks. Continuously monitoring these metrics can help identify patterns and root causes of failures, leading to more informed decision-making and continuous improvement efforts.

Role of Reliability in Distinguishing Elite Performers from Low Performers

Importance of Reliability for Elite Performers

Reliability provides a more holistic view of software delivery performance. Besides capturing velocity and stability, it also takes the ability to consistently deliver reliable services to users into consideration.
Elite-performing teams deploy quickly with high stability and also demonstrate strong operational reliability. They can quickly detect and resolve incidents, minimizing disruptions to the user experience.
Low-performing teams may struggle with reliability. This leads to more frequent incidents, longer recovery times, and overall less reliable service for customers.

Distinguishing Elite from Low Performers

Elite teams excel across all five DORA Metrics.
Low performers may have acceptable velocity metrics but struggle with stability and reliability. This results in more incidents, longer recovery times, and an overall less reliable service.
The reliability metric helps identify teams that have mastered both the development and operational aspects of software delivery.

Conclusion

The reliability metric with the other four DORA DevOps metrics offers a more comprehensive evaluation of software delivery performance. By focusing on system health, stability, and the ability to meet user expectations, this metric provides valuable insights into operational practices and their impact on customer satisfaction.