Evaluating Expressiveness and Effectiveness of Informative Charts

Analysis of 8 recent well-spread Malaysia Covid-19 charts on Facebook

Pei Seng Tan
ViTrox-Publication
9 min readNov 13, 2020

--

Image Source: PM Tips

To be a certified data analyst, I personally feel that it is very important to know how to perform data visualization expressively and effectively to deliver informative messages to the readers.

Before we go to the case studies, there are two questions here.

How do we measure expressiveness in the context of data visualization?

According to Tamara Munzer (2014) who wrote a book named “Visualization Analysis and Design”,

The chart is considered expressive if it can express the information in the dataset attributes.

The dataset attributes normally consist of 3 main types, which are nominal, ordinal and quantitative. To summarize, both nominal and ordinal attributes are qualitative attributes. However, nominal attributes have no implicit ordering whereas ordinal attributes are the vice versa. Both ordinal and quantitative attributes have implicit ordering but there is no meaning in applying arithmetic operations on ordinal attributes only. The examples of nominal, ordinal and quantitative attributes include:

  • Nominal — Types of car, Gender
  • Ordinal — Date, Size of T-shirt
  • Quantitative — Revenue, Profit

If the attribute is ordered, it should appear in the chart as ordered attributes. Conversely, unordered data should not be shown in a way that perceptually implies an ordering that does not exist. For an instance, if x-axis is a nominal attribute and y-axis is a quantitative attribute, a bar chart should be drawn instead of a line chart. If x-axis is an ordinal attribute and y-axis is a quantitative attribute, a line chart should be drawn instead of a bar chart.

Image Source: Microsoft 365

The figure above is a common mistake. The analyst tried to combine the charts of units sold across months and total transaction across months. It should be the combination of two line charts instead of a line chart and a bar chart because the x-axis variable is an ordinal attribute. Thus, it is considered as a bad example.

Plotting two line charts in the same environment may lead to confusion, therefore, we need to know about the ranking of magnitude and identity channels (which will be covered in the following) to avoid confusion. In this case, different colour of lines can be used to achieve both expressive and effective data visualization.

How do we measure effectiveness in the context of data visualization?

The chart is considered effective if the importance of all the attributes match the salience of the channels.

The figure below shows the channel rankings for ordered (quantitative and ordinal) attributes and categorical (nominal) attributes.

Image Source: Tamara Munzer (2014)

As shown, the channels for ordered attributes are known as magnitude channels whereas the channels for categorical attributes are known as identity channels.

Higher the positions of the channels, higher the ranking of effectiveness.

According to Tamara Munzer (2014), the most important attributes should be encoded with the most effective channels to be most noticeable, and then decreasingly important attributes can be matched with less effective channels.

Image Source: SAS

For example, as shown in the figure above, the product attribute is more important than the quarter attribute. Thus, the product attribute is put at the x-axis and the quarter attribute is displayed in different colour blocks. If the readers wish to know the actual sales for each quarter, this chart cannot directly show the figures as the readers have to manually add up the values for each quarter (in same colour blocks).

Case Studies:

Example 1:

Tables below show the numbers of new and total Covid-19 cases across different clusters. Do you think that the chart below is a good example of data visualization?

Image Source: Ministry of Health Malaysia

According to Tamara Munzer (2014), there are 4 common dataset types, which are tables, networks, fields and geometry. The table is considered as one of the dataset types instead of a visualization chart type. Therefore, it should not be used for expressive and effective data visualization.

Image Source: Tamara Munzer (2014)

Referring to the tables above, there are 3 attributes, namely clusters, number of Covid-19 cases and types of Covid-19 cases (either new or old). The total number of Covid-19 cases is the sum of the number of old and new Covid-19 cases. A stack bar chart should be plotted as shown in the figure below because the attributes’ types of the clusters and the number of Covid-19 cases are nominal and quantitative respectively.

Figure: Self-Drawn Stack Bar Chart (Only cover the content from the left table)

The total number of Covid-19 cases is displayed in the x-axis and the number of old and new Covid-19 cases are displayed in different colours.

The clusters are sorted according to the number of new cases like what has been done in the tables above.

Rumah Merah has the highest number of new Covid-19 cases and Penjara Kepayan has the highest number of total Covid-19 cases.

Compared to a table, a stack bar chart has a better visual display of quantitative information as people will not be easily distracted with a bunch of numbers. Besides, a stack bar chart gives a better overview of the dataset.

Example 2:

The figure below shows the distributions of new Covid-19 cases in different districts of Selangor. Do you think that the chart below is a good example of data visualization?

Image Source: Ministry of Health Malaysia

In my opinion, yes, it is. The data analyst segmented the number of new Covid-19 cases into groups and represented these groups with different colours by using the advantages of the rainbow scale. Red, yellow and green colours rank the seriousness of Covid-19 cases from high to low accordingly.

If you convert the rainbow scale into grayscale, they are at a certain level of luminance and saturation in order.

Image Source: Fundamentals of Data Visualization

Therefore, hue colours can be implemented as one of the magnitude channels as well if the rainbow grayscale is used as the reference. The red colour cannot be replaced with other colours like blue and pink colours as it will reduce the expressiveness and effectiveness of the entire chart.

Besides, the light-coloured rings are not necessary to be displayed to avoid readers’ confusion.

Example 3:

If the table is the only choice to visualize the dataset, what can you do to further improve the quality of the table as shown in the figure below?

Image Source: Ministry of Health Malaysia

In my opinion, there are two suggestions that we can further improve this table, which are:

  1. Introducing the ranking index and locating it to the left of the “country” column.
  2. Deriving a new feature named “percentage of Covid-19 cases by country”.

The ranking index is crucial especially when the readers wish to know the ranking of certain countries that located in the middle parts of the table worldwide. Is it possible to tell the readers that what are the rankings of Singapore, Denmark or Malaysia in term of the number of Covid-19 cases worldwide within 5 seconds? The answer is definitely not due to the absence of the ranking index.

Referring to the table, the readers know that the US ranked the first worldwide in term of the number of Covid-19 cases. However, what is the percentage of Covid-19 cases contributed by the US worldwide? Again, this question cannot be answered directly without further calculations.

Example 4 and 5:

The table below shows the number of new Covid-19 cases across different states in Malaysia. After the discussion of some examples previously, are you able to give any suggestion to further improve this?

Image Source: Ministry of Health Malaysia

Compared to the previous example, the element of sorting is missing in this example. The states are sorted randomly which makes the readers spending more time to figure out which state contains the highest or lowest number of new Covid-19 cases. The readers cannot directly answer the question like which state ranks fifth in term of the number of Covid-19 cases in Malaysia.

Another example which contains the same mistake is shown below. The reporter does not take the element of sorting into consideration while preparing this informative content.

Image Source: PocketTimes

Example 6:

The figure below shows the distributions of new Covid-19 cases in different districts of Selangor, Kuala Lumpur and Putrajaya.

Image Source: China Press

Compared to the charts in Example 4 and 5, the reporter really does a good job in preparing this informative content as he arranges the name of Selangor districts to be displayed according to the number of new Covid-19 cases.

Compared to the chart in Example 2, he separates the number of new Covid-19 cases between Selangor, Kuala Lumpur and Putrajaya before he further reports the number of new Covid-19 cases across different districts in Selangor to show the differences of status between states and federal territories.

For the information to non-Malaysia readers, Selangor is considered as one of the Malaysia states and Kuala Lumpur and Putrajaya are two out of three Malaysia federal territories. They should not be mixed while reporting the daily number of new Covid-19 cases.

Besides, the reporter clarified clearly in the title by mentioning that the informative content will cover the data of Selangor, Kuala Lumpur and Putrajaya.

Example 7:

The figure below shows the chart of the total number of Covid-19 cases by China and South East Asia countries.

Image Source: China Press

The chart used in this example, which is a stacked bar chart, is appropriate as the name of the countries is a nominal attribute and the total number of Covid-19 cases is a quantitative attribute. The countries containing the highest and lowest number of Covid-19 cases are ranked top and bottom respectively. The ordering is taken into consideration by the reporter while preparing this chart.

Example 8:

The figure below shows the summary of Covid-19 cases in Malaysia on 31st Oct 2020.

Image Source: Ministry of Health Malaysia

By reading this informative content, you may get the summary of Covid-19 cases quickly but it is not expressive and effective enough in the context of data visualization. There are a few questions for you:

  1. Is the relationship between the number of the total confirmed cases with the number of new cases same with the number of the total confirmed cases with the number of total death cases, total active cases and total recovered cases?
  2. What is the relationship between new cases and active cases? Will the number of new cases contributes to active cases?

In the construction of a mindmap or spider diagram, all the child nodes will always have the same relationship with the parent node. In this case, it is not at all.

Instead of using a mindmap, a tree diagram should be used to visualize the dataset expressively and effectively in this example.

According to the informative content above, we can summarize the rules as follows:

  1. Total Confirmed Cases = Total Death Cases + Total Active Cases + Toal Recovered Cases
  2. New Death Cases is a subset of Total Death Cases
  3. New Cases is a subset of Total Active Cases
  4. New Recovered Cases is a subset of Total Recovered Cases

By using these rules, we can construct a tree diagram easily as shown below.

Figure: Self-Drawn Stack Tree Diagram

In the tree diagram, all the relationships are clearly defined and readers can understand it in a glance. The blue-coloured boxes represent the new cases (newly recovered, added or death cases).

Final words

The main objective of data visualization is to pass the messages to readers expressively and effectively. The priority of the beautifulness of the chart is put lower than the expressiveness and effectiveness of the chart.

References:

  1. Munzner, T. (2014). Visualization analysis and design. CRC press.

--

--