Management for Data 101 — Prioritising Issues

Published in

Inside Aircall

6 min readSep 17, 2019

Preface: This is the second article in a series that talks about things we’ve learnt while building the data team at Aircall. The first article was about preparing your operational strategy.

It goes without saying that some of the things we talk about will have a certain SaaS flavour to them. However, these articles are not meant to serve as a guide but more like examples that we hope will help you structure your ideas.

The problem

Data stacks are tech stacks, and as such share the same weaknesses as every tech stack in the world. They need maintenance, and every now and then you find yourself in a position where you’ve accumulated a lot of technical debt and need to start getting rid of it.

If you’ve got a mountain of technical debt to tackle, the first step is to correctly isolate your issues and the various solutions you can apply to them.

Further, your problems may impact other teams or you may need help from various people — so it’s always helpful to have a general roadmap of your actions so you can coordinate with other actors.

Making molehills out of a mountain

This is when we turned to a simple framework that we call the IRS or Issue — Risk — Solution framework. The whole idea behind the framework is to take all of the problems you’re facing and break them into objective components that give you a clearer vision of what’s important, what’s easy and what’s not.

Tl;Dr — It looks a bit like this:

The astute among you will notice that this framework isn’t entirely unlike the RICE framework that Intercom uses to prioritise their feature requests.

Issues

The first and the easiest step is to list your issues. Once you’ve listed them out, we’ll try to go one step further by also classifying them into a certain category depending on what part of your software or work process they impact. Hence you end up with your problem broken into two parts — context and description.

To illustrate this with an example, let us say that we felt that we were not controlling our coding standard well enough, we could classify the problem as follows:

Context: CI/CD
Description: Heterogenous code base, lack of imposed standards result in code that behaves differently in local & production environments

Risks

The next step is to evaluate the kind of problems that may arise because of these issues. However, in order to avoid the trap of being vague, we break the issue into three parts to add some objectivity.

We begin by describing the issue, e.g., to continue with the example above — we could say that we risk having a heterogenous code base which does not behave as predicted when you change the environment.

With the description in place, we assign a score from 1–5 to indicate how severe we consider the issue to be, and how often it happens. In this case, I’m going to assign 5 to the frequency since we push code multiple times per day. As for the severity, I’ll only give it a 3 because although it is a serious problem, I know that most people in my team have relatively similar local stacks and that offsets the risk a little bit.

Solutions

Having elaborated upon the various problems we might face, it is time to think of what we can do about them. However, instead of coming up with a single overarching solution, we try to consider the various stages at which we can intercept the problem and what we can do at each of those stages.

Based on these stages, we classify the different kinds of solutions as follows. Remember that this is a guideline and you don’t have to fill out all of them for each problem. The idea is that we should at least be able to recover from all issues that have a big impact (severity ≥ 3), or ones that occur often (frequency ≥ 3), but it’s even better if we can avoid or pre-empt them.

Here goes:

Avoid — Ensure that the issue doesn’t come up, or at least reduce the frequency of recurrence. If you can find a good enough solution to avoid the issue, you can completely skip over the other three! But in most cases that’s just wishful thinking…
Raise — Raise a timely alert that can help you fix the solution pre-emptively or allow you to prepare for it to occur.
Minimise Impact — Contain the issue, call up backups, issue communication etc. and in general try to minimise the damage done by the issue once it has occurred.
Recover — Recover from impact caused by the issue — backfill data, restart pipelines etc.

The synthesis

Once you’ve taken the time to list out all your issues and solutions, you should end up with a table like this:

On the other side of the table, add four more columns for the solutions. Something like this:

Here’s how it looks if you put it all together:

How it looks to have your life summed up in one image

The next step is to sum up the scores you assigned to the “Severity” and “Frequency” and come up with one score for each. Some people like to add a weightage to these rather than a simple summation, but this is only a humble example. In any case, weigh them if you must but then come up with one single score.

Shooting your troubles away

Now, we could end this article right here by saying that now that you have your ducks in a row (or your issues in the descending order of their final scores), you should start solving them one by one going from top to bottom.

But being the sophisticated and discerning selves that we are, we know that life is never that simple. All jokes aside, sometimes it is not practically possible or pragmatic to attack the issues in a particular order. You may have resource constraints, or you may be blocked by another team etc. etc.

What you should do instead is take the first “chunk” of the issues you’ve scored — say the top third or the top quarter — and lay out the corresponding solutions on the friendly neighbourhood Effort-Impact matrix.

Effort-Impact Matrix (Hint: stay away from red)

Wrap up

The final layout of your solutions on the effort impact matrix should show you the way.

Get the quick wins (low effort — high impact) out first,
Start planning or laying the foundation for the major projects (high effort — high impact)
Use your Fill-in jobs (low effort — low impact) to fill time where needed
Consider letting go of the thankless tasks (high effort — low impact).

Off into the sprint they go!

These articles aren’t meant to serve as a recipe for success, but more in the way of general ideas to get you started. If you have any feedback or other ideas to share, hit us up on social media! Check back in right here next week for the next article in the Management for Data 101 series, we’ll talk about managing stakeholders.

If you enjoy reading about such things, do consider following Aircall on Medium or Twitter, we try our best to share our learnings as often as we can. Thanks!