In February this year, I led the drive within my team to revamp how we deal with support tickets by introducing a zero-bug policy (ZBP). It’s a simple but revolutionary idea, which we’ve iterated on a lot to make it work as well as possible for our team. Given it’s now been several months since its introduction, I wanted to look back and reflect on the changes we’ve made and the impact they’ve had.
A long time ago, in a SQL Prompt team far away…
SQL Prompt is one of the most beloved tools in the Redgate portfolio and has by far the most users, leading to a need for great management of the support requests the team receives. Therefore, to concentrate on the most important things and get them done, we needed a simple but effective approach.
However, we didn’t have that. As a team, we reviewed any bug reports brought up during the previous week in our weekly hour-long (sometimes longer) triage meeting, and placed each new bug into one of two categories. The first, our support queue, contained bugs “we should fix as soon as possible”, while our support backlog contained bugs “we’d like to fix if we have time”. Seems pretty reasonable, right?
Our problem was that our support queue frequently contained upwards of 25 bugs, with no way of determining whether any random bug on the list was more important than another. This made sprint planning a total nightmare, as deciding which bugs to bring in for development was a seemingly arbitrary process, and bugs would often wait several months before they got fixed. This begged the question — “if we could afford to wait that long to fix them, were the bugs ever that important?”.
Our support backlog was another issue entirely. The number of bugs generally sat around 50, and in our quarterly backlog reviews it was incredibly rare for bugs to make it out of the backlog onto the main support queue. In short, it was a place bugs went to die.
As a team, we were very aware of these issues, bringing them up in many a retrospective. We knew there had to be a better way.
What is a zero-bug policy?
In a nutshell, the zero-bug policy is a no-nonsense bug management system that can be summarised as follows:
How did we implement it?
In one of my previous posts, here, I introduced Version 1.0 of the SQL Prompt Zero Bug Policy. To be more flexible when prioritising work, each bug that was taken on was given a one month expiration date, rather than dogmatically fixing every bug as soon as they arrived. If work hadn’t started on a bug after that first month, it likely never would, so we closed it. We’d seen from looking at our bug fixing statistics that the vast majority of fixed bugs got fixed withing the first month anyway, so this didn’t really impact us very much.
We also needed a new way of deciding which bugs we felt we should fix and which should be closed, rather than relying on the fairly heavyweight weekly triage meeting we’d been using previously. To this end, we developed triage by emoji. This involved the team member currently handling our support escalations posting any new bugs into our dedicated triage channel on Slack. The other team members could inspect the bug and use one of three emoji reactions to convey their opinion:
- 👍 This bug is important and we should fix it.
- 👎 This bug is not important and we should close it.
- 😕 I’m not sure on this one, we should discuss and investigate further as a team.
Once the team came to a consensus on each bug, their respective ticket was updated accordingly.
To get started with the ZBP, the first thing we did was go through all of our existing bugs and re-triage each one using our new process. All those that we deemed not important, we closed, leaving us with around a dozen “important” bugs. We then did a dedicated sprint where we worked our way through them, leaving us with a clean slate with which to move forward.
What early impacts did we see?
One of the key pillars of a ZBP is a high level of transparency between you and your users. For instance, previously we’d accept a bug and stick it on our backlog telling the customer “we’ll get to it as soon as we can”, knowing full well we’ll likely not look at it again for 6 months and then close it. In effect, we were lying to our customers, giving them false hope of a bug fix that was never going to come, and we didn’t feel comfortable with that.
Therefore, we ensured that every time we said “no” to a customer’s bug we gave them a clear reason why. You might think that all of these customers would get angry and demand you fix their bug, but we’ve found the opposite. Almost all of them have been really understanding and appreciated the transparency. After all, most of them are software developers themselves and know that it’s impossible to ship software with no bugs whatsoever. One particular interaction from not long after the introduction of the ZBP has stuck with me. We decided to decline a bug and shared our reasoning with the customer. His response was…
“Thank you for letting me know that. In response to your feedback, I’m thinking that your team’s decision is reasonable if the implementation of the fix may trigger further unforeseen issues. Thank you for taking time on this.”
Another big benefit we saw very quickly was a massive reduction in administration costs. Not only did we no longer have a weekly hour long triage meeting dropped in the middle of our calendars, but we also found we weren’t spending much time at all on managing our bug queue, as the only bugs on there were the ones we’d decided were important enough to fix.
The final benefit we saw very quickly was the average time it took the team to fix bugs and ship them to customers plummeted. By having fewer bugs in our queue, it became much easier to get them to the point where we could actually start work on fixing them, rather than waiting for weeks or even months to get them prioritised.
Iterating on lessons learned…
As with all things in agile software development, we were keen to iterate on our process. One of the big changes we made came about when our team switched from using a Scrum-like two-week iteration workflow to a more flow-based Kanban style of delivering software. Previously, in sprint planning we’d just been taking all the currently open bugs, bringing them in to each sprint and fitting our remaining work around this.
When we switched to Kanban, we decided to put all bugs we’d agreed we should fix to the top of the “To Do” column on our board. This marked them as the most important thing, with the expectation that the next developer(s) to become free would pick up the top bug.
Early on, we noticed a problem with our triage by emoji process. Team members tended to simply add an emoji and move on, leaving the person on support with little to no context as to why a bug was being considered not important enough to fix. Therefore, we introduced a rule that regardless of how you voted, you added a comment explaining why you voted that way. This gave us two main benefits. Firstly, it made sure everyone was on the same page about what the bug was. We’ve seen several instances where a team member had voted one way after misunderstanding the bug, then changing their opinion after discussing it in more detail. Secondly, it made sure each customer received a well-considered, detailed reason why their bug was either accepted or declined.
A couple of months later, we spotted another problematic trend. Even with a ZBP and only having 3 or 4 open bugs at any one time, putting all bugs above everything else was causing some really important feature work and operational maintenance to be continually pushed back. Therefore, we agreed that bugs would be added to the bottom of the list, rather than the top. This meant they weren’t prioritised above the work already on the board, but would be prioritised above the next set of work brought in from the backlog. This simple change allowed us to achieve a better balance of fixing important bugs while still delivering additional value in other areas. It also removed the need for the one month expiration date on bugs, as prioritising the bugs as they go onto the board ensures they get dealt with in a timely manner anyway.
By far the biggest problem we faced was that we had a tendency to be too lenient on which bugs we deemed important enough to fix. This was especially true if we didn’t currently have many bugs open, as we would sometimes take bugs on simply because we felt we could, rather than we should. This led to us doing potentially less valuable work, and meant that whether a bug got fixed or not could be heavily dependent on when it was reported, rather than the actual severity of the bug itself. I spent a while trying to figure out how to articulate this, and eventually came up with this…
If you already had five really important bugs to fix before you got to work on this one, would you still choose to fix it?
Following this advice can be quite tricky. It’s a strange feeling declining to fix a bug when you have no other open bugs, but like anything, it gets easier with time and experience.
What’s Next for Team Prompt?
We’ve seen some amazing benefits as a result of the zero bug policy over the past few months (our customer satisfaction has never been higher), so we’ll be sticking with it going into 2020, where we’ll obviously continue to inspect and adapt it as required.
I’ve had many conversations with others, both within Redgate HQ and out, who’ve expressed an interest in trying out the policy on their development teams. If you have any questions or comments, I’m more than happy to answer them! I’m a firm believer that a ZBP is a great way of dealing with support requests — regardless of your team or product — so I’m very excited to see how things go moving forward.