Colors, Methods and Mistakes in Spatial Data Visualization
Around ten days ago, I have made a presentation in an event organized by Data Science Istanbul group in Istanbul, Turkey. Seminar focused on data visualization and my presentation covered the theme of spatial data visualization.
I wanted to share my presentation with everyone, since we weren’t able to host everyone.
I’ve covered following topics:
- Color Psychology
- Power of Contrast & Boundaries
- Importance of Classification
- Scale: Decision & Visualization
- Interpolation vs Aggregation
- Volume != Area
- Heatmaps != Hot-Spots
- Where “0” Starts?
- %5 of Your Audience
- Worst Mistake
I hope you enjoy it. Please let me know if you want to add to and/or fix my slides.
Color Psychology
There are many resources online about Color Psychology. Every color has multiple meanings & feelings. Therefore, you should be careful when you use a color for your data visualization. People expect to see rivers in blue, parks in green and danger in red.
Power of Contrast & Boundaries
What you see is usually defined by what you see around it. Your choice of colors should be as contrast as possible and you need to seperate your data points with right visualizations. Some good & poor examples are provided by ESRI. As you see slide above, some maps are harder and some are easier to understand. What makes a visualization is easier to understand is usually the good use of contrast colors & boundaries.
Importance of Classification
Slide above shows 3 different maps that are visualized from same data using same number of classes. However, all three of them tells a different story.
First map is classed with Geometric Interval algorithm, that optimizes both the number of data points in a class and variance within the class.
Second map is classed with Equal Intervals algorithm, that basically divides the data with equal intervals.
Third map is classed with Natural Breaks algorithm and it keeps the variance within the class minimum while increasing the variance between classes.
The mission of analyst is to use the correct classification algorithm to pass the message of data. One mistake, and you are telling a completely different story.
Scale: Decision & Visualization
A mistake that I come across very frequently and did it myself multiple times in the past.
Usually due to lack of necessary datasets, people tend to use boundary based demographics to explain feasibility of businesses. Map above shows the highly populated neighbourhoods in red and sparsely populated neighbourhoods in green.
Imagine that you are starting a local business and try to optimize accessible population within 10 minutes walking distance. If you use larger boundaries for your demographic analysis, you are likely to make a big mistake. Because “A” region in yellow neighbourhood actually has more population than “B” region in red zone. I’ve seen banks making decisions with datasets that even have worse resolutions. Big mistake!
Interpolation vs Aggregation
First thing we need to know is that interpolation is predictive while aggregation is descriptive and they cannot be used for one another.
Map above shows your customers on a map. Each point represents a customer and you know how much revenue you are making from each of them. You want to see regions that you sell most.
You can visualize your dots with colormaps to create a more understandable map:
It’s still really hard to see where you are selling most. However visualization makes you believe that right side of the region has more revenue.
By gridding the whole region into equal sized areas, we can aggregate our points to improve our visualization.
Now It’s a lot better to see where we sell most. After this step there is a big pitfall almost every cartographer fall for. We tend to create heatmaps to show visually more pleasing maps.
Heatmaps are by default result of different interpolation algorithms and they try to predict areas that we have no value for.
As seen in slide above, we interpolated our dataset to create visually more pleasing map, however we also broke the data and started to show places that normally have no revenue as generating revenue.
Volume != Area
This is a common mistake that we do everyday. When you look at this map, you feel like company ABC has more market share than the company XYZ. This is due to the landsize of the states. Administrative boundaries are usually meaningless for these kind of datasets.
Heatmaps != Hot-Spots
It’s often a mistake to use heatmaps. Because everytime you want to use a heatmap, you are probably trying to answer a question that can be answered by another algorithm.
Heatmaps are always subjective, end-results are highly effected by parameters you set and ignores the statistical significance. On the other hand, hot-spots are objective and uses algorithms to find statistically significant regions where your high values occurs.
It’s safe to say that in heatmaps, your outliers have greater effect on your end-result as seen on map above.
Where “0” Starts?
When you look at this map, you feel like almost half of the area is actually a sea. However due to the color ramp and classification parameters that I used, we see the areas that are not water as water. Correct visualization should be as followed:
It is really important to set the right color for your values that are turning from negative to positive, bad to good, low to high etc. If you pick your color ramp and classification parameters wrong, then you are transmitting a completely different message.
%5 of Your Audience
Almost %5 of your audience is color blind, most of them are men and green to red color-ramps are nightmare for them. Most of color-blinds see the big map in yellow to dark green tones as you can see in the same slide. Color blinds see those three maps almost identical. There are also other kinds of color blindness which are really rare. If you want to check your visuals for color-blind friendliness, there are great tools online!
Yes, I know that I made this mistake earlier :(
Worst Mistake
That’s a mistake that I’ve deliberately done so far.
Legends are boring but necessary. We often lack them on our visualizations, because somehow we believe that our visuals are perfect and will be understood by everyone in a single look. They are important and if you forget to add legends than you don’t have any excuse for misunderstandings in your visuals.
These are the common mistakes that I came across and done myself in the past, please share your comments if you have more.