Handling bugs at savedroid

Published in

Inside savedroid

6 min readDec 10, 2019

At savedroid we are trying to create the best experience for our users on an ecosystem of different products: FIAT savings app, and crypto app.

Over the last months we’ve been doing several usability tests and gathering our users rich feedback regarding their journeys with us. On the product development side, we are very concerned about the quality of service we provide to our users.

We have multiple processes to ensure quality on our apps:

A great Customer Support and QA team
Dogfooding for employees
Bug tracking with test.IO
Bug bounty program with HackerOne

Unfortunately this is not enough.

Issues reported by our users and community are gathered by our beloved Customer Support Team. They are the first line of action in regards to supporting our users and they do an outstanding job analyzing and reproducing all these issues and providing insights to the product team regarding the most recurring and painful issues.

A few months ago, we decided to do a major change in how we handle all these topics. We learned a lot from this experience and wanted to share what worked and what did not.

Context

Before going into details, let’s see what motivated this change. Not so long ago, we had all developers concentrated on one big product team.

We basically implemented everything you can think of into this monster: apps releases, marketing tools, documentations, back offices, and landing pages etc.

This process was getting less and less future-proof for various reasons:

Poor code isolation, introducing many side effects
For a small change, the whole platform needed to be deployed
Lack of true ownership
etc…

and we started to look for solutions.

Solution

We decide to reorganize the product team in 2 new teams to respond to this situation. A group of individuals that would take care of the following topics:

Pioneers: constantly testing, researching, quick and dirty prototyping;
Bugbusters: production, quality/stability improvements, maintenance, scaling;

In practice

The teams have been working with very light processes: a Microsoft Teams channel, a Jira board with prioritization, and a daily standup.

We moved from an open-office concept to have dedicated spaces for both teams, thinking that team focus and communication would be better.

Over time, we added weekly sprint planning and a retrospective to plan and review all open tasks, with the following questions in mind:

Is it reproducible?
Is it relevant ?
Does it belong to the team?
Does the criticality make sense?
Is it really important?

This has been set up to make sure developers were focused on the most critical tasks and not wasting time asking information or investigating non existing bugs.

Results & learnings

So far, results are pretty encouraging. The backlog significantly reduced and the overall quality of our apps has improved.

This way of working is still teaching us a lot. Here are the most important points:

Having an operational team is awesome

The team is in the front line in case of an incident. From an employee perspective, having someone that acknowledges and investigate critical issues is healthy for everyone. This may sound stupid, but the simple fact of saying “Yes, we saw your ticket, we will have a look” is really reassuring, even if the issue is not fixed right away.

By receiving most of the issues, this team became the most accurate to evaluate issues/tasks urgency.

Having full-time employees is a real benefit…

A lot of companies employ part-time developers and nothing against it. However, in our case, there’s a real benefit of having a full-time core of devs. They gather real expertise in our apps. It means they have an overview of the quality of the apps and can develop the most relevant tools to take them to the next step. It also allows them to detect complex issues and pitfalls.

Becoming real specialists, means devs are able to help other teams in case of any need.

Following all the issues, they also gather a really sane overview of our platforms. This is not about how medium-driven your technical stack is, not about how good is your code coverage. It’s about how you deliver value to our users. They know what is working, what is not and overall what is painful for day to day operations.

… but be careful regarding focus

Our team gets a lot of pressure from the outside. It’s hard to focus on anything but short term.

Moreover, the most experienced devs, are the most qualified to investigate production issues and will be called first in case of an emergency.

Fixing bugs means more bugs

After a few months, the number of incoming issues went up: our customer support team started to report more issues since we were allocating more bandwidth to them.

Everything cannot be fixed

At some point, we realized that our backlog would never entirely disappear and that’s fine. During several months, we encountered various scenarios that are not always simple to handle:

Some tasks are not worth to be fixed, because they won’t be relevant in the close future, or represents too much work for the low criticality;
Some bugs cannot be fixed by design;
Some issues can be really tough. Every developer that tries to fix them get stuck in the legacy and fail at some point;

It’s ok not to fix everything. It’s better to refuse tickets than to let them die in your backlog because at least people get an explanation for that.

The rotation system is a good experiment…

It spreads the pain on all the developers' team which is obviously a sane way to work: you break it, you fix it. All devs realize what our users are experiencing, how legacy systems are behaving and what really matters on the operational side of things.

It’s not always great for developers

It’s not fun to work only on bugs: they spend a lot of time understanding the issue, trying to reproduce, investigating it and finding a solution. These tasks can get very frustrating, especially when the person is not used to it. And sometimes, it’s just that the task is boring. Moreover, most developers are problem solvers and are really affected by their individual performance at closing issues. If they don’t succeed, they lose their motivation!

During their daily life work on other projects, the tasks have nothing to do with normal work, which introduces some context switching which is not easy to manage. That said, some developers consider that as a way to emancipate from their roadmap for a few days.

It does not solve the production health issue

By production health, we mean less critical issues that you can witness yourself on the apps or discover on the monitoring tools. It could be some performance issues, some imperfect implementations, etc, etc.

This could be a good approach to a better quality of service and sometimes allow you to find problems before they get visible for the users.

But this is really time-consuming without the proper tooling, especially when you have two live apps. That said, the long term goals of the team include some monitoring and performance improvements.

What it looks like today

The team is still performing well and the bug backlog is now reasonable. Our apps are more stable which allows us to keep evolving them and implementing new features. The team is responsible for its operation and evolution.

If you enjoyed this post, please like it and recommend it to others. If you follow a different recipe please share it with us in the comments below.