Exploring the Power of Visualizations: Informative Vs. Manipulative

Shubham Sharma
14 min readSep 30, 2023

--

Photo by Isaac Smith on Unsplash

Summary

We took a dataset and created two informative and two manipulative visualizations. In doing so, we used the Five Design Sheet methodology. The steps included understanding the dataset, brainstorming different types of visualizations by sketching them, shortlisting a few, sketching refined versions of them, followed by creating the final versions using altair. We will walk through the design process, share the artifacts from each step, and the thought process behind the decisions we made.

In the remainder of this post, we will look at how visualizations can be informative or manipulative. Sometimes the manipulation with the use of dark patterns in the design process is intentional and at other times it is accidental, and as readers we need to be aware and pay attention.

Dataset

The process started by identifying the dataset to use and fully comprehending it. To identify the best dataset for our project, we thoroughly explored several options, and brainstormed potential insights and manipulations. Ultimately, the Greenhouse Gas Emissions dataset emerged as our top pick, thanks to the comprehensive and innovative ideas we could draw from it.

The following sheet contains ideas brainstormed for two datasets: Police Complaints (red) and Greenhouse Gas Emissions (green).

We decided to use a dataset from The Organization for Economic Co-operation and Development (OECD) that reports greenhouse gas emissions for its member countries. What we really liked about it was that it is quite comprehensive. For example, it has data about each member country for each year from 1990 to 2021, and the data is broken out by the pollutant (e.g., carbon monoxide, methane, etc.) and by different sources (e.g., energy, agriculture, etc.). The emissions per capita and per unit of GDP are included as well. What we did not like about it was that it contains complete data for only around 50 countries. For instance, data about top polluting countries such as China and India is incomplete. Out of the countries with complete data, we ended up choosing three countries: the United States of America, Russia, and Japan. These countries have high levels of emissions, and represent three different continents. We did not want to choose too many countries, to avoid cluttering the visualizations.

Rough Sketches — Informative Visualizations

Once we understood the variables in the dataset, we brainstormed about the questions we could ask and sketched out as many visualizations as we could think of, appropriate for the type of data present in our dataset. We came up with a few options for informative visualizations and a few for manipulative visualizations. Sketching was quick and allowed us to brainstorm the pros and cons of each option before going further.

We followed the 5 Design Sheet Methodology to ensure a structured approach. Each group member went through this process and the results were enlightening.

The following sheet captures the rough sketches for informative visualizations.

With the unfortunate impact of Covid on our group, user testing our designs posed a challenge. We had to rely on picture exchanges among team members for feedback, which, while not ideal, provided valuable insights to refine our final designs.

  1. One of the options we considered for an informative visualization was a line chart of per capita emissions on the y axis, year on the x axis, and the color of the line representing the country. When comparing countries with vastly different populations, per capita numbers are important to consider.

The user testing feedback received for this chart was:

  • Per capita is better than raw totals for emissions
  • Some indication of population of a country (using size) along with per capita information would be helpful
  • Allows comparing apples to apples

Per capita information was already present in the data set with no additional transformation required. In the end, we decided not to pick this visualization since we could not think of a useful aggregation or transformation to use with it.

2. Another option for informative visualization we sketched was a stack area chart where each area is color coded to represent a country and the area is proportional to the amount of emissions produced by that country over a period of time. While it is visually more appealing and colorful, we found that it can be confusing as well.

The user testing feedback received for this chart was:

  • Some readers may think that all of the areas start on the x axis, meaning the country at the top has the most amount of emissions
  • Area looks great but may be harder to accurately compare and assess

Due to these drawbacks, we decided not to use it for informative visualization.

3. Yet another option we considered for an informative visualization was a stacked bar chart to highlight the proportion of emissions caused by different pollutants. In this the x axis would be the year, y axis would be a bar depicting emissions from a country for a year, with different color segments in the bar corresponding to different pollutants.

The user testing feedback received for this chart was:

  • Different color segments start at different levels, making it hard to compare detailed breakdown data between two different bars
  • The larger the number of segments in a bar, the harder it is to comprehend

In addition, the data was directly from the dataset with no transformation required. Due to these reasons, we did not pick this for an informative visualization.

4. Another option considered for informative visualization was a heatmap in which the year would be on the x axis, country on the y axis, and each cell of the matrix would be colored on a scale, with the intensity of the color corresponding to the magnitude of emissions for the corresponding country and year.

The user testing feedback received for this chart was:

  • It gives a a broad sense for the magnitude of emissions across time
  • For just three countries, it may not highlight anything interesting unless the numbers change drastically year to year

We confirmed it by creating a heatmap using altair, which showed that each country had a different shade of cells, but not much variation across time for a given country. So we dropped the idea of using a heatmap for our informative visualization.

The two options we shortlisted from the sketches for informative visualizations are explained later in this post.

Rough Sketches — Manipulative Visualizations

The following sheets explore rough sketches for manipulative visualizations.

Despite the intentional misleading nature of this visual, we maintained the appropriate markings and channels for the datatypes to ensure it was easily interpreted and communicated the relationship between Quantitative Interval, Quantitative Ratio, and Nominal data.

In crafting this manipulative design, two main class principals of misleading visualizations stood out:

  1. Data transformation: Using a dual y-axis gives a false perception that U.S & Japan have similar greenhouse gas outputs, but if you examine closely you can see they’re just on different scales. Zooming into 2018–2020 captures the moment in time where emissions dip and suddenly spike, hinting at a narrative where Covid-19 is the cause. The truth is open to interpretation, as these were the only countries we could find that had this dip and spike for the time period.
  2. Misleading title/labels: The title directly implies a correlation with the pandemic. Arrows pointing to the spike at the year 2020, labeled “Covid”, further reinforce this implication.

The following sheet captures additional rough sketches for manipulative visualizations.

  1. One of the options for a manipulative visualization was to use a pie chart, where each slice of the pie represented total emissions for one country over the time period, and the color of the slice representing the country. We have learned that pie charts are not considered great for human perception since it is harder for our brains to compare two dimensional areas in a circle to each other, as opposed to comparing a single dimensional height of bar charts for example.

The user testing feedback received for this chart was:

  • Include numbers (or percentage) in the slices of the pie chart
  • Consider including name of country also in the slices in addition to the color coding and see how it looks

One of the requirements was that the manipulation had to be subtle and not easily recognizable, so we decided to not use a pie chart.

2. Another manipulative visualization we considered was to use a variable in the dataset that was a bit unconventional and not widely understood. The variable INDEX_1990 is used to report emissions values relative to the value in 1990 for a given country. So for example, for the USA, the value for year 1990 for this variable is 100. For the year 1991, if the emissions went up by 5 percent, the value would be 105, and so on. So the values for all countries start off at 100 for the year 1990. Our initial thought was that it could deceive readers who are not used to reading charts with relative numbers, and cause them to reach incorrect conclusions about what the chart is showing.

The user testing feedback received for this chart was:

  • Relative (to 1990) emissions information on y axis is confusing
  • All countries cannot have the same emissions (100) in 1990
  • Make sure you include a good explanation of how to interpret this information

We did not pick this one since we were not sure whether the deception would be considered subtle or obvious.

The two options we shortlisted from the sketches for manipulative visualizations are explained later in this post.

Refined Sketches — Informative Visualizations

The following sheet captures the refined sketches for two shortlisted informative visualizations.

  1. Our first informative visualization was a horizontal bar chart that shows the total emissions for each country over the time horizon, and also shows the sources of emissions in different colors in each bar. There are ten different sources of emissions such as energy, transportation, agriculture and waste.

The user testing feedback received for this chart was:

  • Looks good as long as there are not too many segments in the bar which will make it look cluttered
  • Horizontal bar chart is better than vertical, makes it easier to compare data
  • Breaking down emissions data by source is very useful
  • Aggregation of data using sum is useful to see the big picture

2. The second informative visualization is a line chart for emissions with a mean line marker. The x axis has the year, the y axis has the emissions. There is a color coded line for each country, where each point represents the emissions value for a year.

The user testing feedback received for this chart was:

  • Visualizing the mean line is useful, to see how yearly values compare with it
  • Perhaps show the mean (number) next to the mean line
  • Maybe useful to differentiate when yearly values are below or above the mean (different color or something similar)

Refined Sketches — Manipulative Visualizations

The following sheet captures the refined sketches for the two shortlisted manipulative visualizations.

  1. Our first manipulative visualization is a chart that claims to show that COVID-19 caused an increase in emissions.

The user testing feedback received for this chart was:

  • Had to look carefully at the graph to understand (especially y axis labels)
  • Confusing to have different y axis numbers on left and right
  • COVID may not be the only factor that caused this change in emissions

2. The second manipulative visualization is a line chart that has cumulative emissions information with misleading title and labels.

The user testing feedback received for this chart was:

  • Is the increase always linear as shown in the sketch or does it flatten?
  • Hopefully this is not real data, just some dummy data for creating the sketch
  • Colors look very similar to each other

The next step was to create the final four visualizations using altair.

Final Visualizations

The following is a screenshot of our first informative visualization. This chart uses bars as marks. Color of the bar is used for encoding source of emissions, and length (size) of bar segments is used to encode the amount of emissions. It uses length of bar (1D size) as a magnitude channel to encode the emissions. The effectiveness rank of this magnitude channel is pretty high and is a good choice. The chart uses color as an identity channel to encode the source of emissions, which is a nominal variable, and this is a good choice. The expressiveness score is high since it is not showing too less or too much information at once. We believe a horizontal stacked bar chart is a great option for this visualization. Using a one dimensional variable such as length or size is great for human perception to compare and comprehend. The data transformation we used is to calculate the sum of total emissions for each country over the years, and including the breakdown showing the source of emissions. The labels of the axes and the legend clearly shows what is being shown, with minimal chance for misinterpretation. One thing we could improve is to add tooltips so that hovering over a segment of a bar shows the exact numbers, which helps in comparisons.

The screenshot of our second informative visualization is shown below. It shows the yearly emissions amounts for each of the three countries over the time period, and the mean values as well. This chart uses lines as marks. Color of the line is used for encoding the country, and position of points on the line is used to encode the amount of emissions. It uses position (1D size) as a magnitude channel to encode the level of emissions. The effectiveness rank of this magnitude channel is pretty high and is a good choice. The chart uses color as an identity channel to encode the country, which is a nominal variable, and this is a good choice. The expressiveness score is high since it is showing the time series values and the mean.

A line chart is a great way to represent time series data, and the color coding of the country corresponding to the lines makes it easy to follow the chart. We added a transformation of calculating and showing the mean value for each country which allows the reader to quickly see when the yearly values are below or above the mean. The labels for the axes and the legend make it easy to follow the chart, with no room for misinterpretation. One thing we could improve is to add tooltips so that hovering over a point on a line shows the exact emissions number for that year and country, which helps in more accurate comparisons.

Our first manipulative visualization shown below suggests a correlation between Covid and Emissions. There are multiple deception techniques used to mislead the readers. The title asserts that there is a strong correlation between COVID-19 and emissions. This blog post explains how the title of a visualization can bias the reader. The label COVID points to the place where the emissions increased back to normal levels from 2020 to 2021, instead of pointing to the place where the emissions decreased between 2019 and 2020 as a result of COVID-19. This research paper details how visualizations played a critical role in forming the narrative around COVID-19. A second deception technique used is different scales on the left and right side of the y axis, to make the reader believe that emissions for USA and Japan are the same in 2020, when they are very different. A third technique used is zooming in to a short period of four years to magnify the decrease and increase in emissions (slope of line) before and after 2020. This technique is mentioned in this blog post as the Truncated Y-axis deception. Our visualization uses cherry-picking and causal inference techniques explained in this blog post. Use of dual axes is an example of a dark pattern known as “Misleading the witness”.

One major strength of this graph is its subtlety. At first glance, it seems legitimate, with appropriate marks, channels, and encodings. The discrepancies only become clear upon careful inspection, where you’ll notice two differently-scaled y-axes and a questionably high amount of zoom in relation to a few countries shown. Also, there’s no figure to tell us if the numbers on the y-axis are high or low to begin with, they seem to be random.

Improvements could include another country to replace Japan with a sharper emission spike in 2020. Japan took a gradual upward increase in 2020, but is not the most extreme example. We could also use an additional type of deception technique such as a sequential color scheme for representing categorical data such as country.

Our second manipulative visualization below shows the emissions over the years for the three countries. This chart uses lines as marks. Color of the line is used for encoding the country, and position of points on the line is used to encode the amount of emissions. It uses position (1D size) as a magnitude channel to encode the level of emissions. The effectiveness rank of this magnitude channel is pretty high and is a good choice. The chart uses color as an identity channel to encode the country, which is a nominal variable, and this is a good choice. The expressiveness score is high since it is not cluttered or trying to show too much information at once.

At first glance, it seems to imply that the numbers shown are yearly numbers, but they are actually cumulative. This deception technique is referenced in this blog post as Cumulative graphs. The y axis label misleads by not mentioning that it is cumulative. The sensational title of the chart feeds into the readers’ preconceived opinions about increasing global emissions which makes them believe these are yearly numbers. As mentioned in this blog post, the title has a significant influence on how readers interpret visuals. The colors are also manipulated to use a sequential single-hue color scheme that is usually reserved for quantitative data, but here it is used for nominal data (country). This color scheme implies some quantitative relationship between the countries, which is not true. This paper talks about how using incorrect coloring schemes can lead to false conclusions. This blog talks about which color scale to use in visualizations. The overall design of the visualization is good which makes it trustworthy, and the deception is subtle, making it hard to spot. One thing we could improve is to add tooltips so that hovering over a point on a line shows the exact cumulative emissions number for that year and country, which makes it more precise and accurate.

Reflections And Future Work

It was a great project to apply learnings from all the readings about good and bad visualizations on a real dataset. We learned the importance of sketching to quickly create something that can be shown and discussed with team mates and other folks to get quick feedback. Incorporating feedback and refining the sketches is also quick and easy. Creating the final visualizations using altair was not very difficult due to its intuitive and powerful capabilities. Creating manipulative visualizations was more difficult but also fun to try out different ways to deliberately create confusing and misleading charts.

If we had more time on the project, we would work on the following:

  • Try out data from different / more countries
  • Explore other similar datasets with more complete data for all countries
  • Create more types of visualizations for informative and manipulative
  • Add tooltips to each of the final visualizations
  • Try out various color schemes
Unlisted

--

--