Implementing a Zero Bug Policy

Louis-Amaury Chaïb
partoo
Published in
6 min readSep 7, 2022

All bugs must be taken care of within at most a week, regardless of their severity.

Raising the bar by lowering it… 🎯

I hope you had a good summer time. Here at Partoo, our tech team has been working with all their energy to achieve a challenging goal: reach 0 bugs 🪲

Who came up with this challenge? Why did we do that? What does 0 bug mean? How did we do it? What now? Let's go over all these questions!

The rationale behind this adventure

It started with Jonàs, our CTO, who, after discussing with peers, realised that our product, as good as it is, would have the "wow effect ✨" if we adopted a "zero bugs" strategy.

The philosophy is simple: bugs are treated as a priority, regardless of their importance. No bug backlog. Sentry and Service Desk empty.

A dream come true 🤩 ! I had personally suggested several times to my former managers, that the rush of delivering features to acquire new customers over proper management of bug backlogs was causing problems with the existing customer base, leading to a drop in retention rate.
Needless to say I was thrilled to have the opportunity to contribute.

Happy employees ⇔ happy customers

It is often said that happy employees make happy customers, I tend to say that it works in both directions.
It’s more satisfying for customer care teams not to encounter unhappy clients because of unresolved bugs, therefore there is less tension between customer care team and engineering team.

I would add that engineers are empowered with the right to innovate and make mistakes, with a collective mindset where those mistakes are owned and quickly fixed.

The cost of non-quality

I've learned over the years that there is no single definition for quality:

  • The quality of service: you may have a poorly designed product but a kick-ass customer care team who handles very well all customer requests, so customers have mixed thoughts about your company and product. Also it doesn't scale very well with the increasing number of customers.
  • The quality of product: the product is efficient, simple to use, customers can be autonomous in (almost) everything they need to do. Customer satisfaction is usually high (you can measure it with NPS).
  • The quality behind the product: we're talking here about software architecture, proper codebase, error management…

I'm proud to say that at Partoo we were already standing good on the 2 first points. We've got A-class customer care, product design and product management 😎.

On the technical side, however, we were facing a lot of errors streamed in our logs, some database queries heavily impacting the application's performances.

When using tools such as Datadog or Sentry to monitor the application, billed upon quantity of logs they're sent, you must ensure you're keeping these streams in check.

Moreover, a poorly maintained codebase becomes a burden that ends up making addition of features slower and slower. That's why I insisted that our code quality, measured with Sonar, should be part of the target.

Setting up the goal

As in most companies, there was already an SLA for customer bugs (opened in Service Desk) of critical severity, which must be resolved within a couple hours.
But now our ambition is to go further and leave no bug unresolved for more than a week, even with the lowest severity level!

In Sentry, our error monitoring tool, no logged error should be left unresolved more than a week. The deadline is 2 days if the error occurs more than a thousand time per day. Same goes for Datadog alerts. As I said, we must keep our errors logs in check!

Sonar, our code quality scanner, classifies its results in 4 quadrants: Bugs, Code Smells, Vulnerability, Security Hotspots. Each of these quadrants is rated from A to E (or further, I don't recall seeing worse ratings) depending on the number of findings and their criticality. Going down to 0 may not be realistic and efficient, but keeping an A score everywhere is pragmatically reachable and maintainable. Somehow, except for the Code Smells part, that's almost equivalent to 0s anyway 😏

Sonar quadrants a few days before goal was reached

And now that this ambition is public, there's no turning back, right? 😈

Kicking off the initial state

There's no possibility to reach such objectives while maintaining cruise mode, i.e. keeping developing features along cleaning of backlog towards 0.

We’ve therefore decided that during the month of August, which is generally rather quiet for our customers and prospects, the tech team would be dedicated to stopping all feature developments and focusing only on fixing the bugs.

Organising upfront

Before going all in for a whole month of bug cleaning, the following steps are necessary:

  • communicate about it: within the tech team to align on the goals, but also with the rest of the company so the objective is also known and shared
  • clean up fossilised bugs: some bugs with low severity never touched, no longer relevant, no way to reproduce it, it's a good time to say goodbye
  • prepare all bugs to be treated, ensure they are properly dispatched to the appropriate teams. Triaging has been shared responsibility between our QA Lead for customer tickets and our SRE for technical tickets (Sentry, Datadog and Sonar)
  • make a single point of visibility for all the bugs. Jira offers the possibility to have a single board with all tasks assigned to every team, sorted by swim-lanes per team. This is a convenient way for everyone to see the overall progress of the project, and also possibility for teams who are done in advance to select tickets where they may help.

There's also one thing that needs to be cristal clear for everyone: you know when you start, you don't know when you stop. It may take a week, it may take a month, it may take 3 months, during which no feature development gets done.

Motivating the troops

Let's face it, the intrinsic motivation of reaching a common goal, or of going back to more interesting activities such as feature development or technical evolutions, may not be enough to keep everyone motivated for the time it may take.

We found it useful to gamify a little the project:

  • show on a TV screen in the office the number of remaining bugs- the sense of going in the right direction will bring additional motivation
  • make a team or individual ranking board on the highest number of bugs fixed in a day/week
  • if you can afford it, offer some rewards: to the team who finishes their backlog first, to value that they were likely the most effective in keeping their bugs in check even before that; to the team who solves the most bugs overall (with possibility to go solve tickets from other teams as soon as they are done with theirs); to the individual who solves the most bugs… take your pick!

Results

A big moment of pride here, results were announced on August 31st: with a team of around 30 developers, despite the holidays, a total of 601 bugs were solved during the month of August! And that's not counting many tickets closed/deleted as duplicates or not reproduced (over 800 tickets were opened in total during the project).

Time to celebrate it 🎉

Tech team celebrating a big achievement

What's next?

Now that the initial state is reached and the rules are clear, there's nothing but good hope that we'll be able to maintain our backlog to a bright 0 🤞💪

To sustain this effort, our QA Team and our SRE will review tickets on a weekly basis and forbid Feature Teams to start a sprint if there are unresolved bugs older than a week.
As for Sonar, it's even possible to automatically prevent any code to be added violating our Quality Gate, it'd be a shame not to use it 👌

--

--