Intelligent Signals : Visualising Data

Sayon Dutta
CustomerGlu
Published in
8 min readApr 2, 2017

Before going in the details of data visualisation let’s start with a famous example graphic.

Charles Joseph Minard’s map displaying the movements and the number of Napoleonic troops during the Russian campaign (1812–1813), as well as the temperature on the return path.

In the graphic above, important details shared are the size of Napoleonic troop while marching ahead, during retreat, temperature, elevation of the topography and the name of the region with year. So many information in one graphic. In this article I will walk you through the important principles which one must consider for data visualisation.

Alberto Cairo and Edward Tufte, two great pioneers of the concerned field have put forward some core principles from past knowledge experiences and fallacies in visualising data which are now considered to be the fundamentals in the field of data visualisations.

Tufte’s heuristics of dark-ink ratio, chartjunk, sparklines and lie factor and Cairo’s Five Qualities of Great Visualisations (truthful, functional, beautiful, insightful, and enlightening) are well worth considering when one is trying to either report or judge visualisation.

Visualization Wheel mentioned in The Functional Art by Alberto Cairo

In his book, The Functional Art, Alberto Cairo provides a tool for thinking about design trade-offs when building information graphics, and he calls this tool the Visualisation Wheel.

The trade offs that Cairo considers are as followed:

  1. Abstraction and Figuration : Level of figuration in a graphic depends on the phenomenon being depicted using physical representations, such as photographs or drawings. As the representations shift from real phenomenon to less real and more conceptual, the emphasis shifts from figuration to abstraction.
  2. Functionality and Decoration : A completely functional graphic has no decorations and is likely to be a direct representation of the data. On the other hand, a highly decorated graphic turns out to be artistically embroidered. These decorations may increase the amount of time a viewer spends considering the visual but sometimes exploring those nuances form mental associations thereby increasing familiarity and memorability.
  3. Density and Lightness : They relate to the amount of information being portrayed. This dimension comes to play when the reader(or audience) is heavily content engaged i.e. density is important unlike the quick narrative ones who prefer lightness i.e. more overview information rather than depth.
  4. Multidimensional and Uni-dimensional : A multidimensional graphic explains a phenomena considering more number of dimensions allowing viewer to explore different aspects. A uni-dimensional graphic instead focuses on a single or a few dimensions i.e. tries to explore the phenomena in limited ways.
  5. Originality and Familiarity : Among the different kinds of information graphics like bar charts, line charts, scatter plots and many more. Thinking requires here is in terms of these representations fit well for representing which information of the phenomena. For certain cases one type of graphic say scatter plots are better than bar charts or vice versa. To judge this, the originality and familiarity trade off matters a lot.
  6. Novelty and Redundancy : Redundancy in a graphic is showing the same information in many different ways. For example, using bar charts and putting colours for different bars depicting which one is higher. Here, in this example the colour is redundant as there’s no use of it. This make your graphics highly complex and confuse the readers. Therefore, novelty is the act of describing each information in the graphic in only one way.

Let’s now discuss Edward Tufte’s graphical heuristics but first of all what’s a heuristic?

A heuristic is a rule/protocol meant to guide you in decision making not in a optimal/perfect but practical way. Heuristics are meant to be followed until you’ve a reason to deviate from them.

Tufte’s graphical heuristics are as follows:

(left) Graphic with lower data-ink ratio, (right) Graphic with higher data-ink ratio
  1. Data-Ink Ratio : Tufte regards this to be the non-erasable core of a graphic. The data-ink is essential for a given variable to make sense of a process. Tufte defined the data-ink ratio as the amount of data-ink divided by the total ink required to print the graphic. Therefore, for higher ratio useless inks i.e. inks/elements which don’t add new information to the graphic should be removed.
  2. Chartjunk : Tufte mentioned these artistic decorations on statistical graphs are like weeds in the data graphics which he named chartjunk. He suggested there’s really three kinds of chartjunk:
  • Unintended Optical Art : Excessive shading or patterning of chart features, such as shown in the graphic alongside cause visual fatigue. These phenomenon are called moiré patterns. It’s better labelling the chart graphics directly instead of heavy patterning causing visual fatigue.
  • Grid : Tufte suggested the grid lines is both unnecessary as data ink. Thinning and removing grid lines makes it easier to see the data. Direct labelling of data is another great way to reduce this form of chartjunk.
Diamonds Were A Girl’s Best Friend, a famous graphic(by Nigel Holmes) featured in Times Magazine in 1982
  • Duck : These are non-data creative graphics, whether they be line art or photographs in the chart. Newspapers and news magazines use them often. One well-known graphic artist, Nigel Holmes used the duck to display data in a memorable graphic Diamonds Were A Girl’s Best Friend (shown alongside). This graphic shows the price trend of diamonds from 1978 to 1982, where on can easily forget the amount in dollars but the spike in the trend is easy to remember owing the shape of the woman’s leg.
Sparktweet

3. Tufte also suggested items called sparklines and referred to them as data words a way to bridge the gap between text and figures. He suggested that sparklines could not only be represented directly in text, but also could be embedded in tables along with the data they describe. For example, a time series data has been provided with a sparkline giving the overview of the trend overtime. There have been lots of different uses of sparklines, such as video games, are a great example. Modern example of sparklines, called the sparktweet are the unicode block characters used to display a bar graph inside the 140 characters allotted by Twitter.

4. Lie factor : Defined as the size of an effect shown in the graphic divided by the size of the effect actually in the data. It’s misleading to the observers. For example, this graphic alongside from Time Magazine in 1979 is showing various barrels of oil and the price of oil over six years. Multiple queries, whether the barrels different sizes, is it the volume of the barrel that represents the growth of the cost, or the height of the barrel?

Next we dive into Cairo’s Five Qualities of Great Visualisations.

The first quality of a good visualisation is that it’s truthful or not. There are techniques and data science and information visualisation, which we can use to shine light on to specific pieces of data which can cast distrust and provide false inference.

Left (original chart), Right(line chart for better understanding)

Let’s check the example of a chart put out by the national cable and telecommunications association. This association lobbies on behalf of US cable companies and provided a graphic which suggested that, after regulations were relaxed, cable companies invested four times as much in the industry. The original chart has been taken down, but Cairo provides an example of what was visualised. Firstly, the title intended to draw the reader to the conclusion, i.e. less regulation means more industry investment. Let’s dive into the problems associated with this graphic, it’s unclear whether the monies are adjusted for inflation, which is a big issue when providing charts of financial data. Then, the time spans for the two bars are different. One covers a span of three years, while the other covers a span of four years. The bars have the same width, but don’t represent the same period of time. And finally, there seems to be a bunch of missing data from years 1997 and 1998. And the chart stops in the year 2003, but the information was tweeted in 2014. For a more honest view of the data check the provided line graph, where after regulation, there were years of sustained industry investment and moreover a slight drop after deregulation, then a massive spike in spending. Then a significant fall from the period post 2002, much of which was excluded from the original chart. The lesson to learn here is that there maybe no absolute truth but always try to use such display tools which are truthful than others.

The second quality of a visualisation to consider is whether it is functional or not. The direct labelling of graph is one way to increase comprehension along with heuristics for increasing functionality.

The third quality of visualisation is whether it is beautiful or not. This requires knowing a great deal about your audience as perceptions vary from people to people.

The fourth quality of visualisation is that it’s insightful or not. The graphic shouldn’t just replicate data, rather it should provide some inference for the viewer giving them an eureka moment. A common issue with most of the people is that they put unnecessary additions to the graphic causing increase in complexity and reader fatigue. But the right results hit the mind of the readers who don’t have extra time to do full reviews.

The fifth and final quality of visualisation is that it’s enlightening or not.

I hope, these basic fundamentals were enough to provide a sound information for all newcomers in the field of data science for a proper and innovative approach towards data visualisation.

p.s. : This was the study and analysis done as per my general experience in the field and a recent study of this MOOC.

--

--