Bugs backlog automation. RICE for bugs

glebsarkisov

Published in

Mayflower team

13 min readNov 3, 2023

How to make your bugs backlog nicely organized and always relevant

Hi there, it’s Gleb again.

Let’s talk about the bugs backlog. Imagine: it is your first day at your new job (you are a test lead, QA manager, maybe a test engineer or even a product/project manager), you meet your new colleagues, start learning the company processes, open the JIRA and face the 300 bugs backlog! Your reaction is nothing but confusion, despair, and pain.

“Ok, — your inner voice says, — that’s exactly why they hired me. I will fix this!”

You dive into the problem and see the actual state of things:

the oldest open bug ticket was reported 3 years ago;
the majority of the open bugs are medium priority, there are 100+ high bugs and the rest is low priority;
nobody knows exactly what the bugs are about and what they affect, especially product managers are not aware of at least about half of them;
a project manager says usually there is only space for 3–4 high bugs in the sprints, everything else keeps sitting in the backlog untouched;
a typical QA does not get why a high bug they reported 6 months ago is still not fixed.

You come up with questions like:

Is there any point in dealing with the backlog? Maybe it is better to close all existing tickets and create bugs from scratch when we catch them?
How many open bugs in the backlog are OK, and how many are too many?
How to get product managers to notice the backlog’s current state? Should we keep them aware of them?

This is exactly what this article is about: how to bring order to your bugs backlog and make it a useful and well-organized space.

Disclaimer #1:

Even though the Zero Bug Policy is not the topic of the article, I will refer to its philosophy “we do not fix it now — we will not fix it in the future”.

Disclaimer #2:

In the article I am not talking about critical bugs, which are not a part of the backlog and have to be fixed ASAP.

A few words on Zero Bug Policy

“If we do not want to fix it right now, this problem is not important to us” is the Zero Bug Policy main idea. The approach can be extreme: if we are not going to fix the bug now, the bug should be closed. The most obvious advantage of the approach is that there is no backlog at all.

It is worth noting, a product manager would hardly agree to switch to this flow right away. In that case, they would need to deal with the already existing 300 bugs. Besides that, they would need to take into account the real users’ data: how many users are affected by which problem and how badly.

Keeping in mind the idea of Zero Bug Policy, let’s talk about the bugs backlog clarity, what it is, and how to bring it.

What is wrong with the 300 bugs backlog?

How bad is it to have 300 open bugs in your backlog? Here are my points.

Pros:

QAs and other team members found these bugs and reported them — this is great! The team looks for bugs and creates tickets in JIRA.

Cons:

A Wild Unknown Territory. No one actually knows what is in these 300 tickets (this is not an issue for you if you have some kind of bug review process). There is always a possibility of some serious problems in your backlog: once reported, affecting only a few users, but once they become huge, though still not critical (and not fixed ASAP).
Backlog’s Heterogeneity. Try to sort 100 high bugs based on the impact of the problem — no easy way to do this. It is tough to make a decision on what has to be fixed now and what is later. Given that we are always tight on resources, we have to decide which bugs we use them on.
Bugs Backlog Growth. Now we have 300 bugs in the backlog, and in 1–2 years, the number would be around 1000, maybe even more. What then? This sounds very concerning.
Additional efforts to re-validate bugs. You need to guarantee your bugs are valid and actual, so you need to check them from time to time. This of course, comes at a cost of additional time and resources to reproduce or close the ticket.

Some might say we should do technical sprints twice a year when we try to fix everything we can. But just imagine how resource-consuming this is and how that shifts focus from the business goals. And even 2 tech sprints would not be enough to fix everything.

Automated bugs backlog: focus on important, ignore everything else

A static backlog did not work for our team. We started looking for an automated solution that would bring focus on the really important bugs and get rid of everything else (you still remember the ZBP idea?).

Our automated system contains 4 elements:

Prioritization. Approach to prioritize bugs, combining priority and severity and making it possible to compare bugs to each other.
Accounting. Collecting the users’ feedback.
Bug lifecycle automation. Automatic priority lowering after a certain period of time and later ticket’s closure.
Information. Automatically inform everyone involved in the process about the bugs’ status.

The first element. Prioritization

Priority and severity

If you want an automatic backlog cleaning system, you have to come up with an approach to lower priority. First, let’s talk about the meaning of priority.

In my project, when the QA / support team creates a bug ticket, they also choose an appropriate priority. That field combines the actual business priority of the problem (exactly what is called priority in the testing literature) and the way the problem affects the functionality of the system (this is called severity in the literature).

So, for us the priority field is a hybrid one, taking into account both parameters. In order to decide on the importance of a bug to be able to compare it with the others, we use the RICE framework. A product manager scores the bug with RICE value, which contains both priority and severity. This makes it possible to compare bugs to each other — I will explain this in the next section.

Of course, you might have your own process and work with priority and severity differently.

RICE for bugs

We have been using the RICE framework for the product and technical tasks, which we decided to also apply to the bugs backlog. Our variation of RICE has a few modifications, but the point is still the same — to have a benchmark to compare the importance of two different tickets to help us prioritize a backlog.

The RICE for bugs in our reading is:

R stands for Reach — how many users are affected by the problem;
I for Impact — from the functionality point of view how serious the problem is for user experience;
C for Confidence — the level of confidence in the Impact and Ease of chosen values;
E for Ease — how easy and time-consuming it is to fix the problem.

Every parameter besides Ease has a range of 1 to 5, where 1 is the lowest value of a parameter (the smallest number of affected users, the lowest impact on the user experience, etc.), and 5 is the highest one. The Ease parameter is calculated differently by the time estimate for the fix: the longer it takes to fix the problem, the less the Ease:

Once all the parameters are multiplied, we have the final RICE for a bug.

Let’s look at these two bugs:

Bug 1: Reach(1) x Impact (5) x Confidence (5) x Ease (3) = 75

Bug 2: Reach(1) x Impact (4) x Confidence (5) x Ease (5) = 100

You can see that the fix for the bug #2 is more important than the bug #1. So, both product and project managers know exactly which bug to plan for the next sprint.

The scoring process

This is how the bugs scoring process works:

QA/support reports a bug and creates a ticket in JIRA with high/medium/low priority (we do have a classification and an agreement for what we call a high/medium/low based on functionality, the platform on which the bug is reproduced, etc.);
The product managers review the high-priority bugs and set RICE for these tickets;
The medium and low-priority bugs are not scored with RICE. For now, our main goal is to deal with the high bugs. As long as we have a stream of new high bugs, we will keep working on them specifically. Once there is an opportunity to manage other priorities, we will use RICE for them as well.

The second element. Accounting and collecting the users’ feedback

We are developing a high load streaming service with a billion monthly visits and more than 100 million users. In order to better understand the scale of a problem, we need to keep track of the user feedback.

That is why we introduced the ‘number of reports’ field to monitor the reports from users on this exact problem, which is updated by our support team based on the data in Zendesk. The field helps our product managers to set proper RICE values (especially Reach and Impact).

You might think — what if there was no report from users when a bug was created, but now there are some? Shouldn’t we reconsider our RICE score, given the current reports?

Of course, we should. This is how:

At our project, we agreed on the reports’ thresholds — the first threshold is at 10 and more (but less than 30) reports, and the second one is at 30, and more reports;
As soon as the threshold is reached for any bug, a product manager of the corresponding functionality is automatically informed about the number of reports, and he has to decide whether there is a need to change RICE or not.

I want to emphasize that thresholds at 10 and 30 reports were picked empirically. If, at some point, we find that these thresholds, in most cases, do not lead to priority changes, we will reconsider them.

But what should we do if there are less than 10 reports on a bug or maybe a product manager simply forgot to change RICE based on reaching the thresholds? In order to fix this, we implemented an automated RICE reset for a high bug every 3 months. A product manager will notice the bug without the score and will set it. Otherwise, a project manager or I as a process holder, will ping the product manager.

That is how we keep track of real users’ feedback and how it affects the sprint workload: which bug is to be fixed in the upcoming sprint and which bug will go through the priority-lowering process, which I will explain now.

The third element. Automation of a bug life cycle

We agreed that if a bug is not fixed in a year, we either lose the bug or it is not important for product managers, and our users and there is no point in fixing it.

This is how the whole lowering process looks like:

If a high bug is not fixed in 6 months after it was created, that means it is not a high bug, and it can be changed to a medium priority (in case a product manager does not say no);
If a medium bug is not fixed in 3 months after it was created / was transitioned to medium priority, that means it is not a medium bug and it can be changed to a low priority (in case a product manager does not say no);
If a low bug is not fixed in 3 months after it was created / was transitioned to low priority, that means it is not a low bug, and it can be closed (in case a product manager does not say no).

The whole priority-lowering process is automated with our self-written solution Automaton, which is integrated into our Slack. Automaton is our internal instrument for all sorts of automations, it holds all the logic for automation and communicates with JIRA via API, acting like a bot in a ticket’s history.

Summing up all the intervals, we have a year-long bug life cycle. The set of rules mentioned above is our workflow for all bugs — excluding critical bugs. There are also some exceptions to this workflow when a product manager might disagree with lowering the priority, and the life cycle can be extended, let’s talk about them.

Case №1

A product manager reviews the list of planned-to-be-lowered medium bugs or planned-to-be-closed low bugs (a week before the lowering)

If there is a need to hold a bug in the current priority, a product manager simply sets RICE. I want to highlight that we also have a numeric threshold for RICE for medium and high priority. I will explain this a bit later.

If a calculated RICE is higher than the threshold or at the threshold level for high priority RICE, then the priority is set to high and the bug falls under the lowering process in 6 months.

If the calculated RICE is lower than the threshold, the bug keeps the current priority — and then if it is medium, it is lowered to low, and if it is low, it is closed.

Case №2

A product manager reviews the list of planned-to-be-lowered high bugs (a week before lowering)

If there is a need to hold a bug in the current priority, a product manager forbids the lowering for the next 6 months. We specifically have this option for our product managers, but we are monitoring how frequently they use it — so far, there has only been one bug blocked from lowering. The block is enabled by marking a Jira checkbox, which has a 6 months timer on it. As soon as 6 months are passed, the bug is added to the planned-to-be-lowered list again.

If you are still feeling uncomfortable with the flow of lowering and closing a bug — you have not yet understood the Zero Bug Policy philosophy: not fixing now — not fixing later, so no point to keep open in the backlog.

The RICE threshold for priorities

As I mentioned earlier we introduced a numeric threshold for RICE to divide high and medium priority bugs.

Look at these two bugs:

Bug 1: Reach(1) x Impact (5) x Confidence (5) x Ease (3) = 75

Bug 2: Reach(5) x Impact (5) x Confidence (5) x Ease (4) = 500

Observation №1

When we asked our product managers to score the existing 100 high bugs from our backlog, a portion of bugs got a significantly lower RICE score than others. As you can see in the above example, RICE of bug №1 is 7 times less than RICE of bug №2.

Observation №2

In those 100 high bugs a selection of bugs has been waiting for a fix for a long time, so it is incorrect to call these high: they are not taken into sprints to be fixed for a while, and they also have a significantly lower RICE score than the other high bugs.

The solution is to have a threshold for low-scored bugs. Intuitively we decided to use the value of RICE = 90 threshold: if a bug is at or higher than 90, it is high — and if it is lower, it is not high.

You can manage this more technically: calculate a 70-percentile slice of all RICEd high bugs and compare it with the same percentile for bugs waiting to be fixed. However, you can intuitively select the value and later correct it if you wish.

The fourth element. Keeping everyone informed

An automatic notification system is a must-have. You would only need to monitor the filters and check the bugs from time to time, while the system would keep everyone informed on every important action.

This is how we set up the system — and what I would suggest you do:

Create a separate channel in Slack (or maybe you use a different messenger) for notifications. Add all the process actors — product managers, project managers, QA managers.
Enable notifications for every part of the process in this channel:

A week before lowering, post a selection of planned-to-be-lowered and planned-to-be-closed bugs + tag the corresponding product manager;
As soon as a selection of bugs is lowered/closed, automatically post these bugs with the current priority/status;
When 3 months passed from the moment the RICE was set, post corresponding information about the RICE reset for these bugs;
Automatically post a notification for a product manager about bugs that have a number of reports at or above your threshold.

The mentioned earlier Automaton is also responsible for informing the process participants. Long story short: automate everything.

The whole process scheme

A bug prioritization approach + collecting users’ feedback + automated priority lowering + notifications approach = the bugs backlog management system. All the parts of the process are displayed at the scheme below, and you can also see how they are connected with each other.

The fate of 300 bugs

You have probably already noticed that I explained the design of the process — but have not yet told you what we did about the existing 300 bugs.

We coped with it. Firstly, we put the oldest bugs through the product managers’ review (10 bugs a week). When there were only 30 bugs left, we started the automatic process with all the rules mentioned above.

We started going through the bugs backlog back in August 2022.

The numbers

We closed 297 bugs where:

there were 24 high bugs — they became medium, then low, and were later closed;
there were 267 medium bugs — they became low and then closed;
6 low bugs were closed.

I must admit we started moving the high bugs through the lowering process in August 2023 (a year after we kicked off the backlog lowering). It took us some time to convince ourselves and the product managers to apply the lowering workflow to high bugs.

The conclusion

A transparent backlog is a result of many working processes, automation, and a mindset. The way you work with the bugs backlog affects many things — the morale of your team, the sprint workload — you name it.

Your efforts in cleaning up the backlog will lead you to better prioritization and focus since you will be taking into account the real users’ feedback combined with product managers’ reviews.

I wish you a clear bugs backlog and effective processes!

P.S. As always, kudos to Rita Kind-Envy for editing!