“Data lifecycle” seems to turn up everywhere. Sometimes it’s short-hand for saying “everything done to or with data”, sometimes it gets a little diagram drawn out. And it’s mainly the little diagram that I take issue with, so that is what I shall be being opinionated about to you today.
As a list of stuff that happens to and with data, the little diagram normally isn’t wrong. It covers some of the important things that occur and flags them up in a nice, prominent way. It’s easy to grasp. It’s simple.
You know the part I have a problem with? It’s the arrows. The arrows are lies.
Arrows suggest a sequence. They suggest progression. They suggest tidiness. And my experience of data in the wild is that, with good reason, its life is anything but tidy.
Here, let’s draw the lifecycle of a human:
I mean, it’s not wrong for everyone. These are major life events that some people will go through. Some people might even go through them in this order. And possibly there could be a situation when you might want to draw this picture (for an alien curious about major events that society recognises in a human lifetimes).
But you wouldn’t use if you were an expert. You wouldn’t base processes on it. And you wouldn’t use arrows.
So why does the data lifecycle diagram keep coming up time and time again? For me, it comes down to two things:
- The stuff that happens to and with data is really important to know about. It is something we want to be able to draw because we want to communicate it to people and make sure we understand it.
- People like tidy. Tidy makes things easy to grasp. And it makes things easy to sell. The diagram shows up frequently where people are trying to sell something (software, simplified processes) to people who know they need to care about data, but find the whole thing a bit complex.
These are not bad things.
The bad things happen when we start to assume that this little diagram actually represents reality. When we start to assume that Thing X will have occurred before Thing Y. That all data goes through a definable gate as it moves through the different, consecutive stages of its life.
Not every human gets married. Not every dataset gets published.
It also sets up a negative role for the idea of data management, as something that exists to force data through these sequential pipes of process. The diagram suggests that we manage data so that it follows the lifecycle.
I think we manage data so that it’s good, and trustworthy, and safe to use. And I think we do that throughout the data lifecycle, however we express it or draw it.
Yes, we need to talk about the things that happen to and with data. Yes, we maybe need to draw that as a picture sometimes. But the visual shorthand, with its promise of tidy and consecutive steps, can mislead as easily as it can inform when it’s used in a technical setting.
We owe it to our data to challenge this narrative where it becomes a lazy crutch for our thinking. Where it fundamentally doesn’t match to reality. Where it brings about the wrong outcomes for our data and our businesses.
So no more pictures? Well, I’m a big fan of pictures and a few years ago I tried to draw what I felt the real data lifecycle is, based on my experience with well-established, feral operational data. Here’s what I drew in 2016:
Everyone is going to have a different perspective on this, and emphasise different things. For me it’s really important to recognise that a lot of stuff is happening at the same time. That for most operational data, there’s no easy “start”. It’s not comprehensive — it doesn’t even begin to show how data breeds and leaves a trail of new little datasets in its wake — but it felt like a picture of the most important things we needed to talk about.
But I got something wrong in that diagram. Data management isn’t a little process ticking along on its own. All these things need to happen with good data management.
So here’s what I’d draw now:
Good data management is not the harness that forces data to jump through the hoops of the data lifecycle. It’s not a step in the lifecycle, or a box to be ticked. It’s the entire context for the conversation, the canvas on which we draw the data lifecycle. Even if what we draw isn’t always tidy.