Improving data analysis through storytelling and relevant visualization
Let’s sort out what data is, who should visualize it and when it is necessary to do it. Article also covers types of data and comparison options including the best practices of how to display it depending on the context.
What is Data?
If you look at Wikipedia, you can see the following explanation:
Data are individual items of information. A datum describes a single quality or quantity of some object or phenomenon. In analytical processes, data are represented by variables.
But here is another vision of data, described by leading information designer Georgia Lupi, who co-founded Accurat, and visualizes data for various companies around the world:
Data does not exist. Data is an instrument that we human beings created to observe, record and archive reality. It is a partial abstraction of reality.Data is not perfect. Our world is mostly random and messy. Collecting data doesn’t make it more perfect or controllable. Data is human. What to collect and what to leave out. The data do not solve the problems. It is a lens, a filter we can apply to see the patterns in our world, but it should never be the focus.
According to Georgia, we need to focus not on numbers, algorithms or technologies, but on what information we want to convey through the visual display of data.
Who should visualise data?
Probably, you might think that the answer to this question is obvious — a designer. But in reality, data visualization does not belong to any discipline, different people practice and contribute to data visualization, such as:
- Statisticians
- Computer scientists
- Journalists
- Psychologists
- Graphic designers, etc.
Any person, who wants to make the information more obvious for auditory can visualize data.
Why to visualise data?
Data enables the discovery of areas, that are not as well known, or not developed, where new opportunities and new dangers may lie…
Due to the visualization, we can expand the various areas of knowledge that are still insufficiently researched and are a set of collected but not processed arrays of information. By comparing or arranging different values, we can see patterns that can become rules in a particular area.
For example, photographer Jay Adler created a temperature spiral that that shows the average temperature for a year since 1850. We can see the impact of greenhouse gases on the atmosphere of our planet. This diagram helps to observe what danger we may face in the future.
The second reason why you should visualize information is the convenience of working with it.
The human brain is designed in such a way that it is much easier for it to work with ordered data than to process it and only then make comparisons.
The following illustration successfully shows how our brain behaves in situations when it needs to think about the information seen on its own and when it is enough to just look at it and not spend time processing it.
Table vs Graphics
It is difficult for us to process data if we use tables to view a large amount of data. First of all, the main purpose of the table is to display accurate data. But the tool that allows us to look more globally at the information contained in the table is a graph.
Successful visualization allows us as a viewer to see the story and draw certain conclusions based on this data. This story helps us understand how one object or phenomenon is related to another. The information in these stories has its visual code, which helps us to better understand it.
To show the data, we convert them into certain objects — columns, bubbles, lines, etc., and by changing their proportions or colors, we allow them to be easily compared.
Does this mean that “a picture is worth a thousand words”?
According to data visualization specialist Alberto Cairo, this may be true in two cases:
- If the story can be told graphically better than verbally.
- When the picture is well designed.
You could stare at a table of numbers all day and never see what would be immediately obvious when looking at a good picture of those same numbers.
Here is an example. The table below compares domestic and international ticket sales in the United States for 2009.
Can we somehow describe the data so that it is possible to conclude? We can compare the figures for different months, but we can not see the patterns here.
It is much easier to use a graph so that this data can be explained as a story. The graph below shows us that the level of sales of international flights is stable, except for one single month. And the sale of tickets for domestic flights is cyclical, the peak is at the end of each quarter, after which it falls.
A graph makes it much easier for us to start telling a story and draw conclusions.
Non-quantitative data
Although data visualization is characterized by displaying relationships between quantitative data, it can also depict non-quantitative relationships.
For example, this chart was used to see how one person relates to another. In addition, we can draw quantitative conclusions from this visualization and describe the circles of communication of a particular person.
Another example of successful visualization is data from applications for tracking jogging or cycling routes. This data can be plotted on a city map to see the most popular routes, and as a result, it is possible to make infrastructural changes to improve conditions in those areas where the largest number of people are.
How to design the visuals of your data?
There are four components in good data visualization:
- Information
- Story
- Purpose
- Visual form
In the following diagram, we can see that with incomplete use of components we will get an undesirable result. For example, combining information, purpose and visual form, we get an uninteresting visualization because it will be difficult to tell a story or draw a qualitative conclusion.
The path to successful visualization consists of five steps:
- Collect the data
- Form the idea
- Determine the type of comparison
- Select the type of graph
- Connect data to the stories they represent
Let’s consider each of these steps in more detail.
Usually, we have to collect data from tables in which they are stored and have a certain structure through which we can understand what data will be used in our visualizations.
Next, we need to understand exactly what idea we want to convey to the end-user. We need to notice the patterns and understand how important they can be and how they will benefit the people who will work with them.
Then we need to choose the data to compare and the way of how we will compare them.
Types of data comparison
Choosing the correct chart for your data is important to avoid confusion and to accurately convey your data. There are five main types of data comparison:
- Component
- Item
- Time-series
- Frequency
- Correlation
It is worth remembering that there can be many types of graphs, but they all process data according to one of the above types of comparisons.
Component comparison
In a component comparison, we must first show the size of a certain component as a percentage of a certain whole. A good example is Pie Chart.
But with a large number of components in comparison, we can do so that the information will be difficult to compare because the segments will be too similar in size or, for example, colors. Therefore, it is better to use this method of comparison with a small amount of information.
Item comparison
In this approach, the viewer can see how objects relate to each other. Are they the same, bigger or smaller than the others? This is exactly what we want to display using this type of charts. In this case, the bar chart works well.
Frequency comparison
You can also represent how many objects fall into certain sequences of numerical values, namely how certain objects show their qualities over some time. It is best to use the Column chart for this case.
Time comparison
First of all, this approach should be used when we want to visualize how our objects change their qualities over time so that the viewer can understand in which direction the trend is moving over time and if so, notice the pattern.
The Line chart works best here.
Correlation comparison
If there is a need to reflect the presence of patterns between different parameters to understand how changes in one of the objects affect the quality of another, think about using the correlation comparison. The best tool for this will be a Scatter chart.
For most graphs, stick with the basic chart types of bar and line graphs, since they are more commonly used and thus easier for people to understand.
As you can see, for time series, frequency and correlation comparisons, we can use two types of charts. They differ in the amount of information we want to display. For time series and frequency comparison should use a histogram if the values are few (i.e. six or seven), and a line chart if there are more. For correlation comparison, it is better to use a bar chart for a small number of values and a scatter chart for a large number of them.
Summary
We reviewed what types of graphs should be used in certain cases. However, it worth remembering that the key part of our visualisation should be a story that will help the viewer to better understand the information displayed.