How I Redesigned a #MakeoverMonday Visualization in 5 Steps using Tableau
Here is how I redesigned a visualization in just 5 steps.
- Explore and Assess the Data
- Analyze Things to Improve in the Current Visualization
- Choosing the Best Visualization Chart(s)
- Choosing the Data Story Type (i.e. Animated Data Story, Presentation, Dashboard)
- Redesigning the Visualization
1. Explore and Assess the Data
The visualization depicts wind power’s key data collected for each state in 2018. The goal of the visualization is to highlight wind power use in United States in 2018. The following features/variables were found in the dataset, their type is noted beside them.
- # of Wind Turbines — Quantitative
- Equivalent Homes Powered — Quantitative
- Installed Capacity (MW) — Quantitative
- Total Investment ($) — Quantitative
- Wind Projects Online — Quantitative
- State — Categorical
During the assessing stage of the dataset, the data was evaluated for limitations and biases that can occur at the three stages: data collection, data processing and data insights. The biases found in the following stages are briefly discussed below:
The following biases were found at this stage, Selection, Response and Missing Variable
Selection and Response Bias were found due to non-response bias's possibility as few states had no response or chose not to respond or they had no data for the variables needed.
Missing Variables bias is possible because more features could be included as a part of data collection to further improve the analysis and final recommendation. Such variables could be price of electricity per state, weather patterns per state etc.
The following biases were found at this stage; Outliers, Distribution Understanding and Missingness Understanding
Outliers were found in a handful of variables as shown below. These outliers are found in the rest of the variables as shown below and they have no significant impact on the results of the analysis.
Distribution across all the quantitative variables is right skewed with all of them being unimodal. From the distribution it can be said that it will not have a massive effect on the result. Having said that, further investigation is required in analyzing any hidden biases.
Further analyzing the distributions, led to the visualization shown below. I wanted to see if the distribution reflected the possible reality, or if the data was filtered or tampered with influencing the validity of the results produced.
As seen in the top-left plot (red) below, the count for $0–5B interval is the highest with moderate Installed Capacity values shown which could only mean one thing. The investment figures reported were undervalued to show that the projects were cheaper than they really were. Same observation could be found for the remaining plots.
Missingness was found in the following States’ data. This bias could be due to the lack of data provided, but after a little research, I found the majority of these States in 2018 happened to be run by Republicans. The States in which there is missingness in data are mostly Republican run (by 88%). Hence the missingness is Missing Not at Random (MNAR). This missingness doesn’t have a significant impact on the results as there is complete data for 82% of the States, but it would be ideal to obtain the complete and accurate data for the States shown below.
- Virginia (Democrats)
- South Carolina (Republican)
- Mississippi (Republican)
- Louisiana (Republican)
- Kentucky (Republican)
- Georgia (Republican)
- Florida (Republican)
- Arkansas (Republican)
- Alabama (Republican)
The following biases were found at this stage: Confounding Variables.
Confounding Variables could be a source of bias in data insights stage, however I found the presence of this bias to be negligible.
2. Analyze Things to Improve in the Current Visualization
As a rule of thumb, data encoded with positional changes (differences in x- and y- position i.e. scatterplots) and length changes (differences in box heights i.e. bar charts and histograms) are understood well by humans.
Following aspects were found to be violating the covenants of effective visualizations:
- Confusing height of the windmills leading to inaccurate comprehension of the values — Lie Factor could be possible.
- Too crowded — Overuse of encodings in one single plot.
The original visualization can be seen below.
3. Choosing the Best Visualization Chart(s)
The following points were devised to choose the best possible chart types.
- Focus on simplicity, remove the extras
- Choose visual encoding that effectively show insights
- Bar charts are best for comparison
Given the nature of the data, as the states are the only categorical data in the dataset and the rest are quantitative in nature. Hence the following chart types will be used in the redesigned visualization:
- Bar Chart — Butterfly Bart Chart
- Pie Chart
Butterfly bar chart was picked to depict better comparison. This type of chart uses less space and carries out effective comparisons. A sample butterfly bar chart that I created for this dataset can be seen below:
Pie chart was created to show the proportion of equivalent homes powered by Wind Energy by State. Only the top five states in terms of most homes powered by wind energy are shown in the chart for simplification purposes. This chart will be used in the redesigned visualization.
One insight that could be derived from the chart is: Out of the approximately 13.5 million homes powered by Wind Energy in USA in 2018, Texas has powered 46% of those homes.
4. Choosing the Data Story Type (i.e. Animated Data Story, Presentation, Dashboard)
In this section I will be discussing why I chose the data story type that I chose for the redesigned visualization.
Firstly, I will discuss why Data Story was not used as the redesigned visualization medium. There are eight types of Data Story. These types are shown below with a rationale as to why they were not chosen:
- Change Over Time — explores the time factor - which is not present in the data
- Hierarchy Drill Down — explores different levels of categories - which is not present in the data
- Zoom In / Out — found in geographical data with zooming into states, city etc. view - which is not present in the data
- Contrasting Values — comparing the most vs the least, comparing the extreme opposites - not appropriate for this analysis
- Intersections — the crossover of values mostly in time-series plots - not appropriate for this analysis
- Different Factors — complex metric broken down into subfactors - not appropriate for this analysis
- Outliers — the nature of the outliers, why are they that way and what can be deduced from them - not appropriate for this analysis
- Correlations — how two measures change together and what could be behind that - not appropriate for this analysis
Secondly, I will discuss why Presentation was not used as the redesigned visualization medium. A Presentation would require a detailed analysis of the features whereas in this case such a detailed analysis is not needed as there are only handful of features/variables available. Given the dataset had more data, then Presentation would have been looked at as a possible option.
A Presentation has the following components:
- Problem Statement
- Building Issue Trees and a Ghost Deck — an analysis roadmap, includes effective analysis; Structured, hypothesis-driven analyses
- Limitation and Biases
- Actionable Recommendation
Lastly, I will discuss why Dashboard was chosen as the redesigned visualization medium. Dashboard is simply the most effective medium for data storytelling for the given dataset due to the limited size of the data and small number of data features available.
The redesigned dashboard will be able to answer questions about:
- Installed Capacity and Investment by States in 2018
- Number of Wind Turbines and Wind Projects Online by States in 2018
- Proportion of homes powered by Wind organized by Top 5 States
5. Redesigning the Visualization
First, I explored the various wireframes for the dashboard design. After obtaining feedback, a final design was chosen as shown below.
The first butterfly bar chart (left most one) will be displaying Installed Capacity and Investment by States in 2018.
The second butterfly bar chart (right most one) will be displaying Number of Wind Turbines and Wind Projects Online by States in 2018.
The pie chart in the middle will be displaying Proportion of Homes Powered by Wind organized by Top 5 States
The two butterfly charts can be filtered using a dynamic filter. The filter filters the charts by the top N states as per users’ preference. This way the user can see the Top most and Bottom most states by wind power production.
A snapshot of the redesigned visualization can be seen below:
My redesigned data visualization does the following better than the original visualization:
- Low Graphicacy level — so that the redesigned visualization can be used by people with low graphicacy.
- Bar Charts and Pie Chart were used to accurately get the message across unlike the original visualization where the windmill height was depicting an inaccurate value hence the length encoded by windmill height was not the most ideal.
- Effective use of color encoding in the redesigned visualization to make the values standout. Bright colors are not used to help decrease eye strain.
- Dashboard layout in the redesigned visualization is spacious and well balanced unlike the original visualization.
- The redesigned visualization dashboard is interactive, giving the user the option to explore the data from different viewpoints.
- The redesigned visualization uses appropriate color encoding in annotations to help deliver the insights more effectively. The annotations are balanced in quantity and size, helping the user extract key insights.
Thank You for Reading!
I had a wonderful time sharing my thought process and the skills I have gained along my data visualization journey. If you are looking to get started with data storytelling and learning more about data visualization, make sure to follow me on Medium and let’s connect on LinkedIn.
Going forward there are many data visualizations waiting to be redesigned on MakeoverMonday, be sure to check it out and start your own data visualization journey.