How to compare values between groups using data visualization?

Mochamad Kautzar Ichramsyah
CodeX
Published in
6 min readNov 2, 2022
Photo by Giorgio Tomassetti on Unsplash

Data visualization is the art of data analysis. It’s an important part of data exploration. Without one, humans will face a lot of difficult time interpreting datasets because normally human eyes can’t process a lot of information simultaneously, which is the job of data visualization to make it easier for us to understand.

In this post, I would like to share how to compare values between groups using data visualization. The key part is “to compare values”. Because data visualization has many uses, such as comparing values, looking at how data is distributed, showing changes over time, looking at relationships or correlations, geographical data, and so on. As usual, we will use Google Sheets for the general concept, you can implement it using any tools you preferred, such as R, Python, Tableau, and so on.

Column chart

The main concepts for this type are:

  1. Used to compare values across multiple categories
  2. Categories on X-axis (horizontal) and values on Y-axis (vertical)
  3. There are column charts, stacked column charts, and 100% stacked column charts.

Let’s use this dataset to practice our skills to use a column chart.

Image 1. Column chart example

From Image 1, we can compare the transaction_valuebetween each product_category, but we don’t get the information of total transaction_valuefor each product_category because it’s separated by the transaction_state. Let’s see the next example.

Image 2. Stacked column chart example

From Image 2, we could see the total transaction_value for each product_category because the transaction_state is stacked while comparing the transaction_value for each product_category. Cool, right? Don’t be surprised, we have 1 more type of column chart after this.

Image 3. 100% stacked column chart example.

From Image 3, we can get different insights, such as for product_category Electronic, even the transaction_value for transaction_state == refunded is the highest $2,096, but in terms of proportion, the highest is Books (almost 50% from total transaction_value ).

So, which type of column chart do we need to use in the future? It depends.

  1. If we want to see a clear comparison between each category in terms of numbers, we can use the standard column chart.
  2. If we want to see a clear comparison when it is summed as the total in each category, we should use the stacked column chart.
  3. If we want to see a clear comparison in terms of proportion rate for each category, we should use the 100% stacked column chart.

I hope you get a sense of how to use the column chart. Let’s move to our next type of comparison data visualization!

Bar chart

The main concepts for this type are:

  1. It’s similar to a column chart, the difference is categories on Y-axis (vertical) and categories on X-axis (horizontal)
  2. We need this type in case the category’s name is long or the sub-category is more than 2.
  3. The sub-type for this bar chart is quite similar to a column chart, such as a standard bar chart, stacked bar chart, and 100% stacked bar chart.

Let’s use this dataset to practice our skills to use a column chart.

Image 4. Bar chart example.

From Image 4, it’s easier to compare each category if we visualize the data like this because if the number of categories keeps increasing, it will be harder for us to compare the value if we keep using a column chart. You can try it by yourself and put your comment below how you see it! :D

Image 5. Stacked bar chart example.

From Image 5, we can compare each category when stacked, just like in the previous visualization. The vibe we got is quite similar to the column chart, the difference is just like mentioned previously, it will be easier if transposed the visualization like this using a bar chart.

Image 6. 100% stacked bar chart example.

From Image 6, we can compare each category while stacked to a 100% proportion rate.

So, which type of bar chart do we need to use in the future? It depends. You can look into the `if-then` condition in column charts, it’s the same. :)

For the last example, let’s say the category is ≤ 2, BUT the name is very long, for example: “Food approved by Food and Drug Administration” and “Food not yet approved by Food and Drug Administration”. The visualization should be like this:

Image 7. Comparison between bar chart and column chart when the category has a long name.

In terms of readability, it’s better to use a bar chart compared to a column chart. You can try to add more categories on your own to see how helpful is the bar chart when you have long name categories and a lot of categories simultaneously. Tell us your thoughts!

Scatter plot

The main concepts for this type are:

  1. We are using this type when the X-axis and Y-axis are numeric values and we have categories to compare.
  2. Usually, it can tell us the correlation between X and Y for each category.
  3. Using points (dot) to visualize data points we have.

Let’s use this dataset to practice our skills to use a scatter plot.

Image 8. Scatter plot example

From Image 8, please look carefully.

  1. The minimum value of the X-axis father_height_cm is 150 and Y-axis children_height_cm is 130. We can decide the value from Edit Chart in Google Sheets.
  2. We can compare each continent for the relationship of father_height_cm and children_height_cm
Image 9. Focusing on the correlation between father and children height in Asia.

From Image 9, we can see the correlation between categories and the values around the square we made. This means from the dataset we used to visualize the scatter plot above, father-children height in cm on average is lower than on other continents.

In Google Sheets, if you notice, I’ve changed the symbol from dot (circle) to square for the Asia data, because I want my audience to focus on that one, that’s why I made it different from others. You can look into Google Sheets Edit Chart function to create yours!

Line chart

The main concepts for this type are:

  1. We are using this type when the X-axis is related to time and Y-axis are numeric values and we have categories to compare.
  2. Usually, it can tell us the comparison between each category in the time we have in the dataset.
  3. We can try to use color to differentiate the categories or use a clear legend to know which line belongs to which categories.

Let’s use this dataset to practice our skills to use a scatter plot.

Image 10. Line chart example

From Image 10, we can compare the transaction_amount each month for each category. The line chart is better for the type of comparison with time because the “line” tells us the “trend” easier compared to using another type of chart, such as a column, bar, or scatter plot.

Image 11. Comparing Column, Bar, Scatter, and Line Plot.

Summary

We still have a lot of other types of data visualization that are usually used for comparing categories, such as Area Charts, Pie Charts, Bubble Charts, and so on, but the four types above we discussed are the basic ones. As long as you understand which one to use depends on the situation, you can improvise it to choose a better visualization to do data storytelling.

I hope this post gives you clarity about the basics of data visualization, in terms of comparing categories in your dataset. Have fun with your data analysis, folks! :D

--

--

Mochamad Kautzar Ichramsyah
CodeX
Writer for

Data analytics professional with 10 years of experience at tech companies in Indonesia.