Visualization horrors in the age of COVID-19
Data Visualization is the most underrated aspect of Data Science, and it is too often associated with simply taking a bunch of data, feeding it to Excel, matplotlib or ggplot and pasting the results. In reality, as a Data Scientist, you should design your visualizations as if the viewer decided to completely ignore the text and only look at the images, and make sure that the message that you want to transmit is evident.
You have found the perfect visualization when it is able to send the right message immediately and, at the same time, the more you stare at it, the more it continues to give you insights, just like a painting from Panini. The more you look at the painting, the more you can recognize the details of the many of its “sub-paintings”.
On the other hand, poorly designed visualizations can be useless or even funny, and sometimes harmful. That is particularly true in this period, in which bad communication can steer public opinion on a very delicate subject as human healthcare.
A true thing badly expressed becomes a lie.
After one painting and one quote, I think we’ve had enough references! Now we can finally dive into some of the many visualization horrors that spread during this pandemic.
The decline that doesn’t exist
The first visualization horror is the masterpiece from the Georgia Department of Public Health website, which aims at showing the distribution of infections over time in 5 counties of the state.
We can immediately see that the number of cases is decreasing over time. Good news? Not actually.
If you look carefully you can notice that the x-axis is not in chronological order, but was arranged to give the appearance that the infection trend is decreasing. If you arrange the axis correctly, the message is incredibly different.
But that’s not all. The order of the counties for each date changes every time, meaning that, unless you are the world champion of “Where’s Wally?”, you are going to have a hard time reading it. Unfortunately, this piece of art has been recently corrected and it is not possible to admire it anymore on their website.
The Axes of Baskervilles
The first time I saw this plot coming from the channel FOX 31 it struck me because of its simplicity and mysterious aura worthy to be in a Sherlock Holmes novel.
Can you spot what’s wrong with this plot?
It took me some time to realize that the scale of the y-axis is not uniform, but above all, there is no pattern that can explain why the labels of that axis end up looking like this.
The only way to obtain such a result is by manually choosing each of those values.
And the question that immediately follows is: has this weird labeling been done on purpose? The answer is even more mysterious because this version of the y-axis is not much different, contrary to the previous plot, from the correct one. I cannot find a reason for doing such a thing.
As easy as pie… chart
The USA is not the only source of these works of art. Italy tried to steal the leading role with a document from the region of Sardinia in which all the pie charts are like the following.
Every pie chart of the document has a category that occupies half of it. The probability of it happening for each of the plots is incredibly tiny!
Well, unless you include the total as a category. In that case, you can have half of the space completely wasted every time you desire. This leads all the other categories to shrink by half, with some categories like the FFP1 which are now barely visible!
The good example
To end on a positive note, I want to share a positive example. During this period of emergency, there has been an explosion of insightful visualization. Most of these visualizations strongly rely on interactivity, which allows condensing information in such a way that the user can decide which aspect of the data he/she wants to explore.
One example is a plot from ourworldindata.org that shows the progress of infection in many countries all over the world. We have seen a lot of these types of graphs, but this case stands out for the amount of information that it contains, its esthetics, and the control the user has.
If you want to explore and play with it, you can go to their page.
For instance, by moving the cursor to the South America label on the right you can see that even though the progress of detected infections in those countries was slower at the beginning, they have right now a very strong increasing trend. Why? Was the actual infection late there or only its detection? Why South America? When will they reach the peak? These are only some of the possible questions that these visualizations can suggest.
We, as humans, evolved to recognize visual patterns very quickly, and this is relevant in a world in which we see thousands of images every day, because we don’t have the chance to observe carefully every single one of them and it becomes easy to spread misconceptions by simply changing the order of an axis.
Doing good visualizations is incredibly hard, especially in case of a tight schedule. However, with practice and a little bit of creativity, one can not only avoid mistakes like the ones we have seen, but can create something engaging, expressive, and, most of all, fair.
If you have come across some horrible or wonderful visualizations as well, write them in the comments!
This is a blog post published by the PoliMi Data Scientists community. We are a student association of Politecnico di Milano that organizes events and write resources on Data Science and Machine Learning topics.
If you have suggestions or you want to come in contact with us, you can write to us on our Facebook page.