Craft of data visualization

An important asset for a data professional.

Ravikiran durbha
Data And Beyond
4 min readAug 26, 2023

--

Story telling is a requisite skill for a data professional and data visualization is a core component of story telling. It is not enough to derive insights, it is important to deliver it. It is necessary for data professionals to understand the underlying principles of presenting information.

Before we look at the principles, a brief detour on its history.

Data visualization has a long history. In 1644, Michael Florent Van Langren is believed to have provided the first data visualization. The readers could instinctively grasp the wide range in the data this way instead of the tabular format.

It would be remiss not to mention William Playfair who is considered to have invented the often used charts today (line, bar, area charts) around 1786, when he was just 27 years old. In 1801, he also invented pie chart and circle graph as a way of presenting data.

Source: Wikipedia — Playfair’s time series charts

Another well known example harkens back to 1854. Dr John Snow, an anesthetist, mapped cholera cases during an outbreak in London. It was clear from the map that all these cases were clustered around a water pump leading to the discovery that Cholera was a water-borne disease and helped save lives. This was an inspiring example that started the trend in data visualization.

Source: Guardian, Dr John Snow’s map showing more Cholera cases around a water pump

In the later half of the 20th century, John Tukey and Edward Tufte pushed the boundaries of data visualization; Tukey with his new statistical approach of exploratory data analysis and Tufte with his book “The Visual Display of Quantitative Information” paved the way for refining data visualization techniques for more than statisticians.

Data visualization is meant for human consumption and hence leans into theories of human perception for its foundational principles. It turns out that we can perceive certain attributes without much attention, like differences in line length, shape, orientation, distances, and color (hue). On the other hand, there are certain attributes that require us to pay more attention to perceive. For example, it will require more attention to identify the number of times a symbol repeats in a text block, but if that symbol appears in different color, size or orientation we can identify it with a lot less attention.

Stephen few, founder of Perceptual edge and an author of many books in the field of data visualization described the following principles for displaying quantitative information, which can be used as a standard while designing reports.

Summarized below are the use cases for different types of charts.

Line chart:

  1. Time-series — A single variable is captured over a period of time.

Bar chart:

  1. Ranking — Categorical subdivisions are ranked in ascending or descending order, such as a ranking of sales performance by sales persons.
  2. Deviation — Categorical subdivisions are compared against a reference, such as a comparison of actual vs. budget expenses for several departments of a business for a given time period. A bar chart can show comparison of the actual versus the reference amount.
  3. Nominal comparison — Comparing categorical subdivisions in no particular order, such as the sales volume by product code. A bar chart may be used for this comparison.

Pie-chart:

  1. Part-to-whole — Categorical subdivisions are measured as a ratio to the whole (i.e., a percentage out of 100%). A pie chart can show the comparison of ratios, such as the market share represented by competitors in a market.

Histogram (a type of bar chart):

  1. Frequency distribution: Shows the number of observations of a particular variable for given interval.

Box-plot:

  1. A box-plot helps visualize key statistics about a distribution, such as median, quartiles, outliers, etc.

Scatter-plot:

  1. Correlation: Comparison between observations represented by two variables (X,Y) to determine if they tend to move in the same or opposite directions. For example, plotting unemployment (X) and inflation (Y) for a sample of months.

Cartogram:

  1. Comparison of a variable across a map or layout, such as the unemployment rate by state.

In general, it is useful to remember John Tukey’s quote while presenting any data visualization — “The greatest value of a picture is when it forces us to notice what we never expected to see”.

--

--