Photo by Simon Migaj on Unsplash

Visualization Quick Tip: Relative Heatmaps

I recently had the experience of working with a large data set that involved generated points on many different iOS device screen sizes including iPhone and iPad (seen below). The goal was to understand the distribution of the points especially since I did not collect the data directly. With approximately 2.4 million records, it is simply unfeasible as well as unhelpful to display these points as a scatterplot.

iOS Device Types (Many different screen sizes)

To remedy this issue of too many points, a technique called hexbinning can be applied to generate a discrete heatmap that indicates the frequency of points in a given region by color. Here’s the first pass:

# Python Code (assumes df is a pandas.DataFrame)
import matplotlib as mpl
import matplotlib.pyplot as plt
plt.hexbin(df['X'], df['Y'], gridsize=50, cmap=mpl.cm.jet,
norm=mpl.colors.LogNorm())
plt.title('Dot values (in screen points)')
cb = plt.colorbar()
cb.set_label('Frequency')
# Invert axis to correspond to iOS origin in upper-left
axis = plt.gca()
axis.set_ylim(axis.get_ylim()[::-1])
axis.xaxis.tick_top()
axis.yaxis.tick_left()
axis.title.set_position([0.5, 1.04])
plt.show()

From this visual, the edges of the common screen sizes and orientations can be seen from the iPhone 4S to the iPad Pro. Unfortunately, the smaller iPhone screen outlines are contaminated by the points chosen on the larger devices. The different screen sizes don’t give a clear picture of how the points were actually chosen for this data collection. Thankfully, the data set also includes columns for the given screen size of the device. By displaying the heatmap relative to the screen size, the picture becomes quite different.

plt.hexbin(df['X'] / df['screen.W'],
df['Y'] / df['screen.H'],
gridsize=60, cmap=mpl.cm.jet, norm=mpl.colors.LogNorm())
plt.title('Dot values (relative to screen size)')
plt.axis([0, 1, 0, 1])
plt.colorbar().set_label('Frequency')
plt.show()

This visualization is much better. We can now clearly see not only that the points were chosen independently of screen size but also where the points are clustered. Keeping the broader perspective of what each of these columns mean enables a better visualization of the details.


Was this helpful? Have comments or improvements? Let’s continue the discussion below!

Christian Di Lorenzo

Written by

Christian, polyglot programmer @LifeOmic, data science unicorn in training, multi-instrumentalist, philosophologist, homely tutor, and ad hoc comedian — rcd.ai

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade