Tangible signals on code quality derived from Jira data

7 min readMar 27, 2024

Today, we’ll take a look at a rather uncommon source of code quality signals. As part of this method, we’ll understand how Jira can provide us with meaningful insights, subsequently adding weight to other pieces of evidence you have at your disposal.

Unlike Story Mapping or the MoSCoW method, this article focuses on a holistic approach.

Work Stream Distribution

The average team undertakes a variety of tasks, including building something new, enhancing the existing codebase, fixing bugs, and assisting other teams with their deliveries, among others. While the idea of dedicating 100% of the team’s time might seem appealing to a business, an uncompromising prioritization of feature work can, over time, contribute to a gradual slowdown of your engineering team.

Most engineering work can be categorized into various work streams. The names and number of these streams may vary, and some tasks might straddle the boundaries between categories. For my preference, I would divide the engineering work into the following work streams:

Roadmap (feature work)— delivering feature work, or pre-planned commitments in case of platform teams.
Enablement — work necessary to unblock other teams, or help them ship faster/better
Enhancement (improvement, service investment)— investments into technical debt, contributions into code quality, random minor product improvements, etc
KTLO (keeping the lights on) — efforts necessary to maintain app’s stability, uptime. Includes bug fixing.

More often than not, you’ll be able to make a good guess about which work stream a specific task relates to. By observing the dynamics of work stream distribution, you can make more informed planning decisions.

Initially, I started collecting this data to gain better visibility into the estimates we commit to as a team. For instance, if we know the team’s average predictability is 85% and that it spends, on average, 60% of its time on roadmap work, we can do some rough math

Target Weeks = Dev Estimate / (Predictability * Roadmap Share)

I believe many of you already employ a similar approach, adding a multiplier to engineering estimates to accommodate non-delivery work. What I propose in this article is a more predictable and intentional way of performing these calculations. Here’s an example with actual numbers:

30 weeks / (0.85 * 0.60) = 58.8 weeks

Almost twice as slow! To illustrate how nicely the numbers compound, let’s assume that you increase your predictability and roadmap delivery by 5% each

30 weeks / (0.90 * 0.65) = 51.3 weeks

A 1.5-month difference can be achieved by making two 5% improvements. But what does this actually mean?

For instance, a significant focus on Keep The Lights On (KTLO) and enhancement activities may signal a call to action for companies intending to grow. It’s important to recognize that dynamics will vary between teams. Feature teams, for example, will have much more roadmap work compared to their platform counterparts. Therefore, applying this approach specifically to the engineering vertical (or whatever structure you have) within a business can be most beneficial.

While some businesses measure this at a high level during roadmapping, I prefer to assess it at an operational level. This serves as an indicator of changes occurring across the organization and within teams in particular.

Before proceeding — I want to call out that categories are implementation details. For example the Kano Model (Wikipedia) introduced by Noriaki Kano in 1984, has a different approach to categorisation, by providing buckets based on customer impact: Basic Needs, Performance Needs, Excitement Needs, Indifferent Features, Reverse Features. Introduction of higher granularity of Kano buckets can provide more meaningful insights into the Roadmap work itself.

What is a good data

Understanding current data in isolation offers limited insights. Let’s say you’ve measured your work stream distribution in the current sprint and discovered that you have 24% of Enablement work. What next? Alone, this information might not be very impactful. However, the true value of this data emerges when you measure dynamics over extended periods.

Imagine you have historical data spanning the last year, approximately 25 sprints. Assume that the team plans, on average, 50 story points’ worth of work every sprint. A team of 10 people will generate data equivalent to 1250 story points. This volume of data is sufficient for plotting graphs and observing trends.

Below, you can see a visualization that highlights the importance of sample size in a chart:

Credit: Wikipedia, https://en.wikipedia.org/wiki/Margin_of_error

Another dimension to this data is the quality of the tickets, and accuracy of estimation. We would want to fulfil the following criteria:

Tasks are granular enough to be distributed between different work streams
All tasks are estimated
Estimates are more-less accurate (verifiable by measuring sprint predictability, let me know if you’d fancy an article about it too)
Tasks are labeled, and/or easy to categorize by engineering manager

Let’s focus on categorization. Ideally, you’d want to make those decisions early on. It will help the EM/PM from having to go back and retroactively categorize the work that has been done before.

Assigning respective labels to tickets will help you automate the job, which comes in especially handy if you manage multiple teams. If your tickets are well labeled, you can operate on CSV exports and automate this to the extent deemed feasible.

Another important aspect of data is normalization. Frequently, teams aren’t consistent with each other in terms of story point weight. To facilitate this, I suggest using a normalization formula to match story points to each other. Assuming everyone is following the same increment system (e.g., Fibonacci numbers), the straightforward formula will be:

TWC = BTC * BTH / (TTC * TTH)

Where:

TWC — Team weight coefficient
BTC — Benchmark team capacity (no, not Bitcoin, sorry)
BTH — Benchmark team headcount
TTC — Target team capacity
TTH — Target team headcount

You can infinitely complicate it further to reflect differences in seniority, account for sick leaves, vacations, etc. It’s entirely up to you to determine when you’ve reached the point of diminishing returns. In principle, a rough approximation is sufficient for this exercise. I don’t bother diving deeper than the example provided above.

What are the good numbers?

This heavily depends on your organisation, budget, and ambition.

Let’s think about a couple of different organizations:

Mature enterprise software, with an established niche on the market, limited competition, yet also limited scalability prospects. While new features are being built — the focus is on maintaining the system, and maximising profit by reducing the operational expenses of the system. Such company may have fairly high KTLO, and enhancement numbers, followed by enablement, and roadmap work in the tail.
On the other side of spectrum, we have a smaller scale up employing a hundred of engineers. Most likely the competition is tough, and the means of surviving as a business is to ship features actively to retain attractiveness of own solution. This company will likely prioritise backlog work, and have a lot of sacrifices of Enhancement work stream.
There are all sorts of companies in between. A good example is Telegram, serving nearly a billion users with a handful of engineers.

Think of it as a gradient and apply common sense when considering what good numbers are. A consistent increase in the share of enablement or KTLO should be a signal. If unplanned, it indicates that the quality of the product or process is deteriorating, ultimately leading to a decreased velocity of feature delivery. Such a situation may signal the need for intervention and a shift in focus towards codebase maintenance, building better tooling, changing processes, etc.

Conversely, very high numbers in the Roadmap share (I’d say 80% and above for feature teams) could indicate that the team is neglecting code quality, potentially increasing technical debt. If something sounds too good to be true, it probably isn’t.

That said, companies differ, and mileage can vary. Please benchmark against your own historical data, not the numbers proposed in this article.

Collecting, and visualizing labeled data:

I find working with CSV exports the easiest, both manually, and automatically. Despite the chosen approach, the workflow will resemble something similar:

Iterate through sprints, collect the data point per each category per sprint
Calculate the % share of work stream. It will help to keep the data resilient to story point inflation and team restructurings.
Draw the plot

I won’t provide specific advice on the process here. Personally, I export data points into a CSV and paste them into a Google Spreadsheet with the necessary formulas and the chart itself. It’s very convenient to embed those charts into slide decks. That said, everyone has different workflows, so consider what works best for you.

Example of trends for a team with KTLO, and Enablement taking over Roadmap work over the course of a year

Shortcomings

There are some distinct disadvantages of this method, which you need to keep in mind:

This method alone isn’t sufficient to make a judgement call. For example increase of KTLO may indicate as drop in quality, but alternatively it can be a reasonable trend for a greenfield project, that got released
Small engineering teams may not produce enough data to make an educated guess
The elephant in room is that this data is based on estimation, not the actual time spent. Besides, it expects you to reflect all (most) of your non-production activities in Jira.

Conclusions

The main takeaway is that you shouldn’t rush to conclusions. Work stream distribution can serve as a tool towards achieving your goals, but it shouldn’t be the sole basis for the decisions you make.

I find work stream distribution data useful, and it aids me in making better planning decisions. The setup requires minimal time, and maintenance is even simpler moving forward.

I would be interested to hear about your experiences and approaches in the comments.