What the DORA Metrics Can and Cannot Tell Us About Organizational Performance

Published in

A Path Less Taken

7 min readJul 19, 2023

Like so many things in the world of Lean and Agile software development, the DevOps Research and Assessment (DORA) metrics can provide important insights, and yet, we need to exercise caution when evaluating these numbers, because they represent only a subset of the factors that we need to consider to evaluate how well we’re performing as an organization. Thanks to Stefan Wolpers, I came across an article by LinearB CEO Ori Keren titled DORA Metrics: We’ve Been Using Them Wrong, wherein he makes many important points, and I’d like to touch on some of those points here.

And after publishing this post, I got helpful feedback from Bryan Finster, which I’ve done my best to corporate via a few updates below.

Summary of the DORA Metrics

Let’s briefly revisit the DORA metrics. For many years, there were four DORA metrics:

Deployment Frequency — How often releases to Production occur
Lead Time for Changes —How much time elapses between initial code commit to code running in Production (+)
Change Failure Rate —How often a deployment to Production results in a failure
Time to Restore Service — How long it takes to recover from a failure in Production

(+) As Bryan pointed out to me via his comments on this post, it’s important to keep in mind that this measure is best understood as the cycle time associated with the delivery pipeline, which it is especially important to point out because this measure does not account for activities to the left of the build and deploy process itself. And I would also add, there are other definitions of Lead Time that are discussed at length among Lean and Kanban practitioners, which I’m not going to cover in this blog post. For the purposes of this conversation, Lead time ONLY means what it means within the context of the DORA metrics.

In 2021, a fifth DORA metric was introduced, which I’ll broadly define as follows:

Reliability — The extent to which an organization meets its users’ availability and performance expectations (++)

(++) A good place to look when considering how we’ll we’re doing with reliability is to assess how we’re doing against our Service Level Indicators (SLIs) and Service Level Objectives (SLOs).

In the most general of terms, when it comes to what the five DORA metrics can tell us, those signals fall into two categories:

Throughput — Deployment Frequency and Lead Time for Changes
Stability — Change Failure Rate, Time to Restore Service, and Reliability

To make one additional generalization, software feature teams tend to think a lot about, and be measured against, Throughput — the relative frequency with which they get features to Production. Operations-centric teams tend to think a lot about, and be measured against, Stability — the extent to which systems are secure, available, and reliable.

Going Above and Beyond the DORA Metrics

Returning to What Ori has to say about where the DORA metrics fall short, he identifies three areas for consideration:

While the DORA metrics are important data points, they are lagging indicators when it comes to Throughput and Stability (and other areas as well, such as quality and efficiency)
Meaningful change among technical practitioners needs to happen from the bottom-up, by optimizing the workflows where they spend most of their time
Business stakeholders often struggle to connect the dots between DORA metrics and things they are more likely to care about on a day-to-day basis, like value delivery and happy customers

Seeking Improvement via Leading Indicators

Since the DORA metrics are lagging indicators when it comes to efficiency and quality, what are some examples of leading indicators we could consider? Ori mentions quite a few:

Pull Request (PR) size
PR pickup time
PR review time
PR review depth
Code churn

Let’s take a look at each one of these:

PR size

When we think about PR size, it’s helpful to remember the importance of the Lean concept of small batch size, because in software development, PR size is an important early indicator of batch size. Ori points out that small PR size has numerous benefits, including:

Higher likelihood of completion without interruptions (let’s face it, frequent interruptions are a reality on many teams)
Higher likelihood of being picked up sooner by another developer
The potential to reduce the number of hand-offs, and thereby to reduce the idle time that accompanies hand-offs
Reduced and release.

When we consider considerations such as those above, it’s not a big leap to see how small PR size can also improve Cycle Time and Deployment Frequency, for example.

PR pickup time

To define the term, PR pick-up time is how much time goes by between the time when a PR is issued, and when the code review of that PR begins. I’ve worked with plenty of teams where there was evidence that PRs were sitting for quite a while waiting to be picked up by another team member, and some of the most common root causes include:

High amounts of simultaneous Work In Progress (WIP)
Work items that are too large (which is related to PR size)
Lack of team clarity on how much time is acceptable for PRs to wait for pick-up

Note: For helpful advice related to reduction of PR pickup time, see also the LinearB blog post Cycle Time Breakdown: Tactics for Reducing Pull Request Pickup Time.

It may also be helpful to reference the advice that is available on stack overflow regarding strategies for understanding how long GitHub PRs have been open.

PR review time/PR review depth

How long PR reviews take is an important component of cycle time. The topic of code reviews is bigger than I’m going to try to tackle here. Based on experience in working with many teams, general observations on this topic include:

It’s important to consider out of office time, for instance, where it may be necessary to be especially proactive in cases where a primary reviewer may be unavailable
Some teams choose to articulate code review expectations and norms in artifacts such as team working agreements
Voltaire’s advice to “not let the perfect be the enemy of the good” often comes into play with code reviews

A differing perspective

An observation that Bryan makes in his comments on this post has to do with not giving too much weight to PR metrics. His suggestion is instead to focus on two measures of batch size: 1) How long it takes to complete a user story, and; 2) How frequently code is integrated to the trunk. He goes on to say that “These [measures] should be guarded by tracking defect arrival rates. Measuring these encourage shrinking batch sizes. That has a positive impact on quality and feedback loops.”

Code churn

Ori concludes with code churn as a leading indicator worthy of attention. Bryan suggests instead that Continuous Integration (CI) is a better area of focus, because it has a stronger correlation with reducing risk. In Bryan’s view, “measuring ‘code churn’ will result in less frequent change which will drive up batch size, negatively impact quality processes, and slow down feedback loops.”

Focusing on Engineering Outcomes

As I’ve written about previously, organizations often suffer from myopia, by spending a lot of time thinking about outputs, but very little time when it comes to outcomes:

Outcomes as Enablers of Business Impact

Raise your hand if you’ve ever been in a situation where you felt like a team you were a part of, or even an entire…

medium.com

Ori offers helpful advice about a couple of non-DORA metrics that are outcome-focused, where I’m changing his terminology to reflect my own experiences when working with teams:

Team Allocation
Ability to Deliver

Team Allocation

It may come as a surprise to leaders to learn that team allocations at any given point in time might not necessarily accurately reflect business priorities. Thus this metric focuses on what percentage of headcount is working on initiative one vs initiative two vs initiative 3, and it can be helpful to look at this with some frequency to make sure team allocations really do reflect areas that the organization wishes to focus on.

To put it in the context of the DORA metrics, it’s not helpful to deliver the wrong thing well (i.e., the DORA metrics might look great, but the company bottom line might not).

Ability to Deliver

Few topics get more attention in agile practitioner circles than the ability to deliver against expectations. Ori uses the term “Project Planning Accuracy,” and observes that in the aggregate, the numbers are not great across the industry when it comes to the extent to which teams finish what they said they would finish over the course of any given iteration (Sprint).

The advice I would offer here is to use probabilistic forecasting when setting expectations with stakeholders, and also to lean heavily on the following triad of Lean metrics when working with teams:

Work In Progress (WIP)
Cycle Time
Throughput

Here are a couple of things I’ve written related to forecasting:

The Difference Between a Forecast and a Commitment

For anyone who has been involved with software development practices for a significant amount of time, it will come as…

medium.com

A Fresh Perspective on Forecasting in Software Development

In a recent post I delved into some foundational concepts from the book Noise: A Flaw in Human Judgement, by Kahneman…

medium.com

Conclusion

As i stated I would do at the beginning of this blog post, I’ve touched on only a subset of the points that Ori makes. He concludes with examples of how each DORA metric can be tied to business outcomes, and with a four-step process to improve organizational performance against DORA metrics. I highly recommend that you read his full blog post, and I hope my summary of some of its key points has been helpful.

What the DORA Metrics Can and Cannot Tell Us About Organizational Performance

Summary of the DORA Metrics

Going Above and Beyond the DORA Metrics

Seeking Improvement via Leading Indicators

Focusing on Engineering Outcomes

Outcomes as Enablers of Business Impact

Raise your hand if you’ve ever been in a situation where you felt like a team you were a part of, or even an entire…

The Difference Between a Forecast and a Commitment

For anyone who has been involved with software development practices for a significant amount of time, it will come as…

A Fresh Perspective on Forecasting in Software Development

In a recent post I delved into some foundational concepts from the book Noise: A Flaw in Human Judgement, by Kahneman…

Conclusion

Written by Philip Rogers