99.9% crash free sessions

Christian Dehning
SHARE NOW TECH
Published in
4 min readAug 21, 2017

--

As one of the Android developers of the car2go app, I am happy to share that we achieved our long term goal of 99.9% crash free sessions 🎉. Actually, we passed it by quite a bit already — without even noticing.

In this blog post I want to share how we got there and which lessons I learned from that journey.

Set a Goal

It all started at some point last year in January. Our six Android developers sat together and defined some “yearly” goals for ourselves. We wanted something that keeps us focused on improving the quality of our app.

So we defined that we want to skip as few releases as possible, increase our unit test coverage by a lot, get rid of certain Android classes in our project (👋 AsyncTasks, 👋 Handlers and 👋 Fragments) and last but not least we wanted to have an awesome KPI of 99.9% crash free sessions (we were at ~99.6% back then).

Track the Progress

Let’s stick to the crash free sessions. The first thing we did was setting up a Wiki page. We put our goals on the top of it and then started to check where we can get these numbers from.

Since we already used Crashlytics from Fabric, tracking that number was easy. We looked at the crash-free sessions of the previous release whenever we released a new version to production and published that number in our Wiki.

We could have gone more realistic, e.g. by setting custom time range and checking how all our app versions were performing since our last release. But we wanted to keep things as simple as possible when we started.

How to improve?

So here is an overview of what we did to improve that number:

  • Have your top crashes in your Sprint backlog: Make sure you reserve some time for this — so don’t forget about your bugs when planning a sprint.
  • Get the priorities right: Depending on your project, a crash that effects hundreds of users every day is probably more important than building that one upcoming feature. Check with your Product Owner how to to order things in your backlog. As long as you can explain him your bugs, he probably knows how critical they are for your customers.
  • Do not hold back your fixes and push out another release: If you spot a bigger problem and you manage to fix it just after the release is out, do not be afraid to make another release right away. Sometimes having a hotfix release instead of waiting for the next release is the better option even though it will be yet another update for your customers to download.
  • Test a lot: Write unit tests, have testing sessions with your team, make use of alpha/beta testing. Every one of those stages will reveal different bugs and crashes. And the earlier you catch them, the earlier you can check how to react. Some bugs will make you postpone your release a bit, others won’t. But at least you get the chance to decide about that.

Don’t look away

Probably the easiest way to get rid of crashes is catching them everywhere and then simply ignoring them. But this is not what we wanted to achieve with our goal. We wanted to improve the quality of our app.

And let’s be honest: If an exception is thrown in your app, probably your users reached some state that you were not expecting. And who knows if they can progress with what they wanted to do any further? That’s right, nobody knows.

That is why we decided to crash our app way more often than we did before. A big change for us was crashing whenever an Observable from RxJava completed which should never complete. We have a lot of data streams within our app that are constantly updating. And a lot of our logic depends on that to happen.

That basically made our crash-free sessions drop a lot at first. But, it helped us finding out where our error handling had to be improved to keep our Subscriptions alive. We could fix many critical issues which up until then were close to impossible to reproduce.

As nice side effect, we saved a lot of time we would have spent on trying to reproduce these issues. And in the end we basically freed enough time to get our crash-free sessions back to where we wanted them.

--

--