SQL Prompt goes on the offensive in the War on Bugs

Thomas Walsh
Ingeniously Simple
Published in
4 min readFeb 18, 2019

On the Prompt team, dealing with support tickets is an integral part of the job, and with so many monthly active users, we get more bug reports than most. SQL Prompt also acts as an entry-point into the Redgate ecosystem for many customers, so providing a good support experience is critical.

Unfortunately, this is something we’ve really struggled with over the past year. New bugs were always coming in faster than we could fix them, so the backlog of support items could get intimidating.

Here’s a summary of our outgoing support workflow:

  • Incoming bugs were triaged in a meeting every Monday morning.
  • “High” priority tickets were added to the main support queue.
  • “Low/Medium” priority tickets were added to the support backlog.
  • Every two weeks in our iteration planning meeting, we’d decide as a team which items from the support queue we wanted to work on.
  • Every 3 months, we’d review the support backlog as a team, seeing if there were any tickets we felt should be brought into the main support queue.

There were several problems with this:

  • Triage meetings were disruptive, taking people away from their desks and bugs often waited for several days to be triaged.
  • Our triaging process marked lots of tickets as high-priority. Many of these then waited months to be fixed, begging the question “were they really high priority?”
  • The support backlog meetings rarely resulted in any tickets being brought into the support queue and fixed. It was in effect, the place where bug reports went to die.

What did we do?

I went to the team with a proposal for trying out a zero-bug policy (ZBP) as a potential way of alleviating these issues. I’d seen it was being explored by another team on the Redgate Tech Radar and was keen to try it out on Prompt. I won’t go into detail as to the specifics of what the ZBP entails (there’s an excellent blog post on the subject here), but it can be condensed down to this:

If a bug is reported that you feel you should fix, then you fix it right away. If not, you close it.

I crowd-sourced feedback from the team on this, along with a slightly less brutal approach, and then within a single 1-hour meeting, we had a consensus on how we wanted to move forward. We also had a separate meeting to discuss “what makes a high-priority bug?”, as we agreed this was key to following our newly-agreed process.

The new process

The team had a few concerns around simply closing bugs we couldn’t immediately prioritise because we wanted to be a little more flexible in our planning. We therefore settled on giving each bug a one-month expiration date. If work hasn’t started on it within a month, then the ticket gets closed. There’s obviously a risk that some customers will be disappointed when we tell them the bug they reported won’t be fixed, but we want to be as transparent as possible and just sticking their bug on the backlog to die a slow death seemed dishonest on our part. Our metrics show that 70% of the bugs we fix get fixed in the first month anyway, so this ensures we’re honest with our customers and will also allow us to prioritise better moving forward.

When it comes to triaging bugs, we agreed that the following four things should be considered:

  1. Is the bug on the red route?
  2. Is the bug a regression?
  3. Does the bug negatively impact the customers T-SQL code?
  4. Is the bug part of core functionality?

We’ve also scrapped the weekly triage meeting and are instead trialling something I’ve dubbed triage by emoji. The person on the support rota posts a link to a support ticket in our dedicated slack channel and at least two team members respond with an emoji — thumbs up for “high priority”, thumbs down for “low/medium priority” and a confused face for “I’m not sure”. If there’s disagreement, the relevant people discuss it until a consensus is reached, after which the ticket is updated accordingly.

To get everyone on the same page, we used this process for re-triaging all our existing bug reports so we can start from a manageable state. From a backlog of over 80 bugs, we’ve trimmed it down to just 15. Our aim for the future is to get this number down to less than ten, and more importantly, to keep it there.

If you have any questions, feel free to comment and I’ll be happy to answer any questions you may have.

--

--