Here’s Why You Need to Start Measuring Your R&D Department

Published in

CyberArk Engineering

6 min readAug 13, 2020

We are working in an industry that evolves endlessly and demands continuous improvement. While you do your best to improve, how do you know where to focus efforts? How do you know if the improvements you implemented are working?

Measuring the department can help understand what is going on

There is a lot of know-how on team metrics, but much less when it comes to the higher levels of the organization. For instance, how do you measure your last Program Increment?

Until recently, we too had difficulty measuring ourselves at the department level.

In this post I will explain how we started to measure ourselves, what we learned from it, and what plans lie ahead.

Growing Pains

Our R&D department has grown fast in the last few years, moving from a small cohesive family-like organization to a large, diverse and constantly changing one. As we continue to grow, we need additional processes and tools to help us track and improve. In the past our managers knew everything that was going on through a simple team meeting, but now — with so many of us — it can be difficult to reach the same understanding.

We know that continuous improvement is the only way of keeping our heads above water, but it has also been impeded by not knowing exactly where we are. So, we decided to start measuring ourselves at the department level.

Measurements are essential for finding where you need to improve

Measuring Concepts

First, we set our main goal — use measurements for learning and continuous improvement. Another important goal was to enable R&D leaders to use the measurements, as a way of exposing our achievements to higher management.

Secondly, we decided to use lagging measures, as they help us recover from high diversity in the way the teams work.

We look at the epic/feature level of our backlog, instead of lower team-level items. We also decided to measure at the end of every Program Increment (PI), which in our case has 4 two-week sprints.

We use Jira as our task management system, but it doesn’t provide good enough reporting capabilities, so we chose to use a BI tool on top of it. This enabled us to see department level metrics and drill down to group/team level.

Now that we had the reporting mechanism in place, we had to choose our Key Performance Indicators (KPIs).

It is important to build your measurement system wisely

Choosing KPIs

We had a lot of KPIs in mind, but we knew we had to start with just a few. We wanted to measure both outcome and value , but soon realized these are more complex and that our real priority must be to get things rolling.

On the other hand, metrics drive culture. As Eliyahu M. Goldratt (originator of the Theory of Constraints) said:

Tell me how you measure me, and I will tell you how I will behave

We knew we had to be careful about choosing the KPIs — as they can really shape the culture of our group.

We decided to start off with 3 traditional and balanced KPIs, allowing ourselves to gain quick and insightful information. The KPIs chosen were Throughput, Predictability and Quality. Each includes a few metrics and each metric measures a specific perspective.

Throughput, Quality and Predictability — The initial KPIs we chose to measure

Throughput: Rate of production
The most important throughput metric counts the number of epics, per type, we were able to finish within the PI. This KPI enables us to see either an improvement or a decline in our ability to get things done, as well as help us plan better.

Predictability: Degree to which a correct prediction can be made
This includes the following metrics:

Planned vs actual, measured as: number of Done items / number of planned items. Aiming for ~ 80%.
Looking at how good our initial estimates were, compared to the lower level estimates that we gave as we learned more about the requirements
Looking at different levels of commitment (commitments, planned and stretch), and measuring our predictability for each level

This enables us to see whether we are improving our abilities to plan, give commitments and predict our accomplishments.

Quality: Being suitable for its intended purpose, while satisfying customer expectations
The most important quality metric is Backlog Management Index (BMI), measured as: number of bugs fixed / number of bugs opened. This enables us to make sure we are not creating a big quality debt.

Starting to Measure

To start things off, we focused on a single product. We created a dashboard per PI, iteratively adding metrics to the KPIs, getting stakeholders to look and use them, and asking for feedback. Once we had a few PIs measured, we started another dashboard to compare PIs and show trends.

KPI dashboards — Snapshot of one of our dashboards, showing the highlighted KPIs

Lessons Learned (So Far)

Trends and improvement areas

Having the KPIs measured on a single group over the last few PIs, we were able to notice trends. For example, an increase in the quantity of the unplanned items tells us that our planning is getting less effective and we need to re-examine it.

We also found some areas with potential for improvement, e.g. our estimations were off more than expected. Having seen this, the group decided to work on its planning abilities, to improve predictability.

KPIs can show you trends and places to improve

Testing hypotheses and understanding consequences

Having a baseline also allows us to start testing some hypotheses. For example, we would like to test if more planning efforts will enable us to improve both predictability and throughput.

We can also understand consequences of events that are less in our control, e.g. How has Covid-19 affected us? Has the throughput decreased? Is working from home hurting our quality?

Secondary gains

Since we started to measure, we realized that we need a standard. We were missing a common language and common methods of visualizing our work. This caused trouble when we had strong dependencies between groups and products. So, we created a unified standard on our epic level, including, common T-shirt sizes, workflows and practices.