3 steps of preparation before visualization
What should you do to prepare when you get data that you will use for visualization?
If you are new to data visualization, do you have no idea how to start your visualization work when you get the data you are about to use for visualization? Are you unsure of what to do to prepare? Good preparation will make your goals clear when you do visualization. It will also make the visualizations you make better.
Clarify your queries or visualization goals
The first step you need to take is to clarify your goal for the visualization. Do you want to compare numerical differences between species, or do you want to observe the trend of a species over time?
In terms of purpose, visualizations fall into four broad categories: comparison, relationship, distribution, and composition. Comparison is to compare the values or trends of several categories side by side, etc. The relationship is an inquiry into the correlation between two or more factors. Distribution is an inquiry into how variables are distributed in time or space. Composition is used to show how the constituents relate to each other relative to the whole and how they change over time.
For example, if I got sales data for each format of music in the US (as described in Zoey’s blog), then this would be a visualization that belongs to the composition. The goal is to explore how sales of different music forms vary from year to year for all forms of music sales.
Explore and Pre-process your data
Explore your dataset to filter out which attributes you want to use and which queries correspond to those attributes. It is important to filter the attributes you want to use and to determine what kind of attributes (e.g. categorical data or numerical data) you want to use, as this will directly affect how you choose your charts when creating visualizations. Also, the attributes used by different queries may be different and perhaps overlap. Therefore, we also need to determine the relationship between these attributes. Finally, you will need to do pre-processing of the data, such as converting the format. Of course, you have different ways to process the data, such as Python, Tableau, and Excel.
For example, if you want to explore the change in Apple’s sales of different types of electronics over time (you can see Apple’s quarterly earnings report by clicking here), you would use time (quarter), product type, and Revenue. Here is a screenshot of Apple’s Q4 2018 report.
In this case, the product type is categorical data and Revenue is numerical data. Also, it should be noted that the quarter property needs to be pre-processed, and we can directly apply the Change Data Type function in Tableau to directly convert the strings in the original data into the standard date format.
Determine the type of chart
Many tutorials on how to choose a visual chart type filter the charts based on the purpose of the visualization first and then finalize them based on the number of variables. As illustrated in the following link:
How to Choose the Right Chart for Your Data
If you have data you want to visualize, make sure you use the right charts. While your data might work with multiple…
However, I personally prefer to start with the number of variables, then combine it with the purpose of visualization, and finally determine the type of chart to visualize. The following are my personal reflections for reference only. From the perspective of the number of variables, they can be broadly classified as: single variable, double variables, multi variables and special variables. Below I will briefly describe a few chart types that are suitable for different numbers of variables. I will not cover all the visual charts, the following is just an idea for you to choose the type of chart.
For univariate problems, a heat map is a good choice. For example, excluding time and city, we only want to compare the temperature of different cities. We do not want to compare the number of people in different cities, the number of universities, etc. So this is a univariate problem and a heat map can show very visually how the temperature changes in different cities.
Bar charts are widely used to show the number of different categories. But I want to emphasize that we need to know not only the basic bar charts but also horizontal bar charts, stacked bar charts, etc. Different types of bar charts can have very different effects in different problem contexts.
Meanwhile, if we want to emphasize a certain category, we can paint that category in a different color.
Scatter diagrams are suited to analyze problems with two types of factors. The different positions of the points in the diagram often represent different meanings. For example, in the figure above, the point in the upper right corner represents Milton or Ferson higher and longer in diameter.
Bi-Directional Bar Chart
Similarly, the Bi-Directional Bar Chart is suitable for comparing two variables. In the Bi-Directional Bar Chart, we can clearly observe the variation or correlation between the two variables.
In the above chart, the June and July sales of items A, B, C and D are compared accordingly. We can clearly observe the variation between June and July sales for each category.
Similarly, for example, you can also use the Bi-Directional Bar Chart to compare the relationship between salary and scoring for different NBA players. You can also clearly observe that often the higher the salary, the better the scoring ability of the player.
For multiple variables, the radar chart is a good choice. Radar charts are often used to show the combined ability value of a particular individual. The composite ability is represented through multiple dimensions. In the radar chart we can clearly observe which aspect of the person is worse.
For example, the radar chart above shows a student’s performance in various subjects. We can see that this student has a large gap in sport and R-coding compared to other subjects.
For data containing location information, map charts are a great choice if you happen to want to explore questions about geographic location.
For example, in the above chart, we can easily observe the sales of a product in each state in the U.S. Map chart can help the company adjust its sales strategies in time.
In summary, after you get the data you need to use for your visualization, you need to: 1. Clarify your queries or visualization goals. 2. Explore and Pre-process your data. 3. Determine the type of chart.
Once you’ve done the above three steps, you’re ready to do visualizations. Next let’s start our visualization journey and complete wonderful visualizations using tools like Tableau.
 Data Society. Four questions you should ask before visualizing your data. https://medium.com/the-data-experience/four-questions-you-should-ask-before-visualizing-your-data-cd20a302eb65
 Steve Doig. Basic Steps in Working with Data. https://datajournalism.com/read/handbook/one/understanding-data/basic-steps-in-working-with-data
 How to Choose the Right Chart for Your Data. https://infogram.com/page/choose-the-right-chart-data-visualization