The Cycle of Encoding and Decoding
How does data visualization work?
Tl;dr: Data visualizations are basically the double encoding of a complex system. Thus, the readers of data visualizations need to go through specific steps of decoding in order to understand and interpret the data and the underlying system. To create better data visualizations, we as data designers, developers or data scientists need to understand the decoding process.
Many of the complex systems that we are dealing with today, such as production processes or customer lifetime journeys, are largely invisible. Only through the indirect route of data collection and analysis can we understand these systems to such an extent that we can identify and solve problems.
Data visualizations can be a friendly way to understand and access this data. They should also enable non-techies to read and interpret data.
A data visualization is the end result of a long and complex process, in which many obstacles often have to be overcome: the merging of data from heterogeneous sources, the cleaning, the transformation into usable formats, and sometimes, also statistical calculations. Finally, we look at the colorful and appealing presentation with pride and think: ‘Yay! for the first time, our data is visible!’ We have done all the hard work and serve the user the information on a silver platter. The user now has it easy and nothing can go wrong anymore, right?
However, the journey is not over yet. Data visualizations can often be misunderstood or even not understood at all. If you have observed this before, either as a reader or as a creator of visualizations, then you should read on.
According to my observations, a lot of things can go amiss. For over ten years I have been developing user interfaces for data products and I have made the painful experience that my visualizations sometimes don’t fulfill their purpose. The users don’t understand the visual language, can’t grasp the data structure, or can’t understand the meaning of the presented data. In some particularly ambitious business intelligence projects, we found that the visualizations were rarely used afterward. But why is that? And how can you do better?
Let’s take a step-by-step look at the process of creating and reading data visualizations, the cycle of encoding and decoding:
Step 1: Identify measurable objects
Creating a data visualization doesn’t start with the data, as is often assumed, it starts much earlier, with the system behind it. Behind all the data lies a system in the real world. It is usually invisible and very complex.
Take a webshop, for example. Nobody can simply see the streams of visitors. First, we need an idea; a mental map of this webshop, to decide which objects, properties and events we want to measure. There would be a multitude of things that could be interesting, visitors, websites, shopping baskets, and products. But not all things we can measure technically, like the thoughts of a webshop visitor while he decides not to buy a product in his shopping cart. How nice it would be to know that!
By identifying the objects and properties that we actually can and want to measure, we already make a selective choice as to which parts of the system will be represented in our data. If our mental map is incomplete, inaccurate or simply wrong, we might not ask the right questions and measure the right data.
Already in the first step of data visualization creation, we have to admit that we can not capture the full complexity of the system. The data is not a complete representation of the system, it is just tiny bits of indicators.
Step 2: Define the data structure, collect data and apply statistics
Next, we create a data model. Here, too, a lot is lost, because a data model cannot capture every detail of the real world. Anyone who has ever done this knows that an inflated data model is a dangerous monster. That’s why we prefer to stay as simple as possible and leave out unimportant details. So, in the end, we get a neat and clean data model that holds only tiny bits of information about the system. By transforming the raw data into our data model, the depth of information gets reduced a second time.
Step 3: Generate visualizations and data products
After we have collected data and processed it, we finally create our visualization. Again, the abundance of available information stored in data tables must be filtered, because technically chart types can only display a limited amount of objects and features. But this is not the only reason we need to reduce the amount of data presented: Good data visualization design considers the human aspects, too. We need to focus on the message we want to communicate, on the needs of the audience, and on their perceptive skills. Thus we have to select a tiny part from the available data for each chart: Maybe one object type and one or two features, like the products and their sales figures per day.
Up to this step, two things happened that we should keep in mind: First, not all of the complexity of the system is visible in the visualization, and second, we encrypted the system twice. First, it was translated into the data, and then, the data was translated into the visualization. That is why we call this process encoding and the producers of data visualizations encoders.
Step 4: Crack translation key and identify objects and properties
Are we done now? Is the final data visualization the end of the process?
No, not at all, because now comes the second half, the decoding process. A decoder is a reader of a visualization. In order to understand the system and to get action-relevant insights, the decoder has to go backward from the data visualization to the data, then from the data to the mental model. This can be quite a challenge. The goal of the encoder is to confirm, refine or change her own mental model with the help of the data.
How can she do this? In the first step, the reader has to crack the translation key. What do the lines, points, positions, and colors mean? What do the axis label and the legend say? At the same time, she tries to recognize the data structure. These points in the timeline, are they clicks per hour per country or per product?
This skill, called graphicacy, cannot be taken for granted. It must be acquired by practicing reading many different chart types and understanding the principles of data structures. We must realise that many people are already failing at this step. Graphicacy is unevenly distributed in the population. So it is important to know what level your audience has.
If our user has succeeded in decoding the underlying data structure, she can move on to the next step: understanding what the data actually means.
Step 5: Interpret and find meaning
In step 5, the decoder first needs to get an idea of the real-world objects that are represented by the data. What properties do they have? How are they related? Can she spot patterns? Statistical knowledge is helpful for this. Are these absolute or relative numbers? How large is the population? For example, what are the proportions and relationships between jeans, clothing sizes and orders?
However, understanding the quantitative information is not enough. Now she knows that something has happened, but not why. This is in many cases an unsatisfactory situation. A decoder wants to learn about the inner workings of the system. For this, she needs to interpret the data. Interpretation means basically to find the ‘why’. Why do we sell so many oversized jeans? Why did we sell 17% more last Saturday than the average Saturday?
This is where her domain knowledge, her mental map of the system comes in. The context lost through the double encoding has to be added again. Maybe she knows a potential cause for the 17% increase on that Saturday: a marketing campaign? A change in the product presentation on the website?
We, the encoders, distilled and aggregated our knowledge of the world into a condensed and abstract form. In order to add meaning to this data, to find the why, the decoder needs to go down the ladder of abstraction, down to specific things. We don’t think in the abstract. We think in specific images, humans, events, locations, objects and the stories. We understand the cause and effect in our world through stories.
So what the decoder does when she interprets data is, she tries to connect the information bits to what she already knows, to her mental model of the system. Her mental model is built upon her own experiences and the stories she heard. A mental model can contain abstract information, too, and it usually does. Once we understand the causality (the mechanics of a part of a system) we can abstract from the specific details, for example, a single webshop visitor and his motivation to buy, to an aggregated view of many visitors and weekly patterns.
“People assume that the world has a causal texture — that its events can be explained by the worlds very nature, rather than just being one damn thing after another.” — Steven Pinker, How the Mind Works
Adding meaning to the data is an invisible process that has to be accomplished by the decoder. The more the user already knows about the system, the easier this usually gets. Consider the shocking piece of The Washington Post about the staggering millennial wealth deficit. Although the chart type is somewhat unusual, I could decode it within a minute. This is because I do not only have enough graphicacy and numeracy, but also because I know already lots of additional things about the visualized objects and metrics: I know about humans in general, about generations, the names of these three generations, about wealth and how people manage to accumulate savings during their lives, about the economic developments in the U.S. during the last decades and so on. I can go down the ladder of abstraction to the life stories of myself, my friends and many people I have read about, looking for potential causes for the wealth deficit. By reading this data visualization, I could add to my already detailed mental map of the world a new detail, re-sharpening a blurred area.
A healthy dose of critical thinking is also part of this process. What motives did the author have? Which data is missing? Can the data be correct? Were there measurement errors or uncertainties? Do I have a cognitive bias? Which statements can I derive from this and which not?
If the reader has enough domain knowledge, this process will happen almost effortlessly and unconsciously. But if her mental map of the system is not sufficient, or if the data points can not be connected to it, she will probably fail in this step. We encoders have the responsibility to make sure that our audience has all the important bits of knowledge to decode our data visualization.
How can we achieve this? How can mental maps and models be visualized, together with data? We data practitioners visualize data within models sometimes intuitively, but I really would like to dive deeper into this topic. This is definitely stuff for another article. Also, I would be very happy if you have some insights to share, please contact me.
Step 6: Update mental map and story
If the data and mental model fit together, the decoder can now interpret the data and gain actionable insights. She can expand her knowledge and improve her mental map. She can ask further specific questions. What exactly the decoder does with the gained knowledge after a successful interpretation depends on the purpose this data visualization has for her. Thinking in Jobs to be Done categories can be helpful here. Data visualizations and data products are hired by users or businesses for a special job, typically one (or more) of these:
- Decision support: Do I have to take an umbrella tomorrow?
- System health indication: Is my webshop running as it should?
- Performance feedback loop: Did I meet my 5000 steps per day goal last month?
- Root cause analysis: Why did the sales rate drop in this region?
- Knowledge creation: Is particulate matter a thing where I live? How do its levels change over time?
- Trust building: They say this robot can detect incorrect invoices with a low error rate. Is this really true?
Sometimes data visualizations get this wrong. If the webshop’s dashboard shows key performance indicators, but the decoder wants to know which product’s sales rates are declining, she does not get what she needs. If a medium writer wants to know how many readers give a full read through her article, showing only the average reading time will frustrate her.
“It takes a long time to translate information into useful knowledge.” — Nate Silver, The Signal and the Noise
It is of utter importance that we as data encoders and creators of data visualizations make every effort to find out the exact job to be done of the product. If we design the data visualization product to fulfill the user’s need, the knowledge extracted in this process actually becomes useful.
If your users know little or nothing about the underlying system, translating naked numbers into useful information can get tough for them, even if the data is visualized in a clear and appealing way.
As we have seen, however, the road to useful knowledge is a long one. There are many dangers lurking. Only the one who keeps an eye on the complete cycle of encoding and decoding will be able to build a working data product or data visualization.
I am pleased that the Cycle of Encoding and Decoding (aka Data Design Guide) has been included in the study “Future Skills: A Framework for Data Literacy” (in German) by Hochschulforum Digitalisierung. The Data Design Guide serves as the basis for the framework for data literacy developed there. This framework sets the direction of education at German universities. “Analogous to the evaluation criteria for language skills, the competence framework developed here distinguishes between coding and decoding processes.”
Please join the Designation newsletter if you fancy tiny chunks of data visualization treats.