How to compare values between groups using data visualization?
Data visualization is the art of data analysis. It’s an important part of data exploration. Without one, humans will face a lot of difficult time interpreting datasets because normally human eyes can’t process a lot of information simultaneously, which is the job of data visualization to make it easier for us to understand.
In this post, I would like to share how to compare values between groups using data visualization. The key part is “to compare values”. Because data visualization has many uses, such as comparing values, looking at how data is distributed, showing changes over time, looking at relationships or correlations, geographical data, and so on. As usual, we will use Google Sheets for the general concept, you can implement it using any tools you preferred, such as R, Python, Tableau, and so on.
Column chart
The main concepts for this type are:
- Used to compare values across multiple categories
- Categories on X-axis (horizontal) and values on Y-axis (vertical)
- There are column charts, stacked column charts, and 100% stacked column charts.
Let’s use this dataset to practice our skills to use a column chart.
From Image 1, we can compare the transaction_value
between each product_category
, but we don’t get the information of total transaction_value
for each product_category
because it’s separated by the transaction_state
. Let’s see the next example.
From Image 2, we could see the total transaction_value
for each product_category
because the transaction_state
is stacked while comparing the transaction_value
for each product_category
. Cool, right? Don’t be surprised, we have 1 more type of column chart after this.
From Image 3, we can get different insights, such as for product_category
Electronic, even the transaction_value
for transaction_state == refunded
is the highest $2,096, but in terms of proportion, the highest is Books (almost 50% from total transaction_value
).
So, which type of column chart do we need to use in the future? It depends.
- If we want to see a clear comparison between each category in terms of numbers, we can use the standard column chart.
- If we want to see a clear comparison when it is summed as the total in each category, we should use the stacked column chart.
- If we want to see a clear comparison in terms of proportion rate for each category, we should use the 100% stacked column chart.
I hope you get a sense of how to use the column chart. Let’s move to our next type of comparison data visualization!
Bar chart
The main concepts for this type are:
- It’s similar to a column chart, the difference is categories on Y-axis (vertical) and categories on X-axis (horizontal)
- We need this type in case the category’s name is long or the sub-category is more than 2.
- The sub-type for this bar chart is quite similar to a column chart, such as a standard bar chart, stacked bar chart, and 100% stacked bar chart.
Let’s use this dataset to practice our skills to use a column chart.
From Image 4, it’s easier to compare each category if we visualize the data like this because if the number of categories keeps increasing, it will be harder for us to compare the value if we keep using a column chart. You can try it by yourself and put your comment below how you see it! :D
From Image 5, we can compare each category when stacked, just like in the previous visualization. The vibe we got is quite similar to the column chart, the difference is just like mentioned previously, it will be easier if transposed the visualization like this using a bar chart.
From Image 6, we can compare each category while stacked to a 100% proportion rate.
So, which type of bar chart do we need to use in the future? It depends. You can look into the `if-then` condition in column charts, it’s the same. :)
For the last example, let’s say the category is ≤ 2, BUT the name is very long, for example: “Food approved by Food and Drug Administration” and “Food not yet approved by Food and Drug Administration”. The visualization should be like this:
In terms of readability, it’s better to use a bar chart compared to a column chart. You can try to add more categories on your own to see how helpful is the bar chart when you have long name categories and a lot of categories simultaneously. Tell us your thoughts!
Scatter plot
The main concepts for this type are:
- We are using this type when the X-axis and Y-axis are numeric values and we have categories to compare.
- Usually, it can tell us the correlation between X and Y for each category.
- Using points (dot) to visualize data points we have.
Let’s use this dataset to practice our skills to use a scatter plot.
From Image 8, please look carefully.
- The minimum value of the X-axis
father_height_cm
is 150 and Y-axischildren_height_cm
is 130. We can decide the value from Edit Chart in Google Sheets. - We can compare each continent for the relationship of
father_height_cm
andchildren_height_cm
From Image 9, we can see the correlation between categories and the values around the square we made. This means from the dataset we used to visualize the scatter plot above, father-children height in cm on average is lower than on other continents.
In Google Sheets, if you notice, I’ve changed the symbol from dot (circle) to square for the Asia data, because I want my audience to focus on that one, that’s why I made it different from others. You can look into Google Sheets Edit Chart function to create yours!
Line chart
The main concepts for this type are:
- We are using this type when the X-axis is related to time and Y-axis are numeric values and we have categories to compare.
- Usually, it can tell us the comparison between each category in the time we have in the dataset.
- We can try to use color to differentiate the categories or use a clear legend to know which line belongs to which categories.
Let’s use this dataset to practice our skills to use a scatter plot.
From Image 10, we can compare the transaction_amount
each month for each category. The line chart is better for the type of comparison with time because the “line” tells us the “trend” easier compared to using another type of chart, such as a column, bar, or scatter plot.
Summary
We still have a lot of other types of data visualization that are usually used for comparing categories, such as Area Charts, Pie Charts, Bubble Charts, and so on, but the four types above we discussed are the basic ones. As long as you understand which one to use depends on the situation, you can improvise it to choose a better visualization to do data storytelling.
I hope this post gives you clarity about the basics of data visualization, in terms of comparing categories in your dataset. Have fun with your data analysis, folks! :D