
Visualization Quick Tip: Relative Heatmaps
I recently had the experience of working with a large data set that involved generated points on many different iOS device screen sizes including iPhone and iPad (seen below). The goal was to understand the distribution of the points especially since I did not collect the data directly. With approximately 2.4 million records, it is simply unfeasible as well as unhelpful to display these points as a scatterplot.

To remedy this issue of too many points, a technique called hexbinning can be applied to generate a discrete heatmap that indicates the frequency of points in a given region by color. Here’s the first pass:

# Python Code (assumes df is a pandas.DataFrame)
import matplotlib as mpl
import matplotlib.pyplot as pltplt.hexbin(df['X'], df['Y'], gridsize=50, cmap=mpl.cm.jet,
norm=mpl.colors.LogNorm())
plt.title('Dot values (in screen points)')cb = plt.colorbar()
cb.set_label('Frequency')# Invert axis to correspond to iOS origin in upper-left
axis = plt.gca()
axis.set_ylim(axis.get_ylim()[::-1])
axis.xaxis.tick_top()
axis.yaxis.tick_left()
axis.title.set_position([0.5, 1.04])plt.show()
From this visual, the edges of the common screen sizes and orientations can be seen from the iPhone 4S to the iPad Pro. Unfortunately, the smaller iPhone screen outlines are contaminated by the points chosen on the larger devices. The different screen sizes don’t give a clear picture of how the points were actually chosen for this data collection. Thankfully, the data set also includes columns for the given screen size of the device. By displaying the heatmap relative to the screen size, the picture becomes quite different.

plt.hexbin(df['X'] / df['screen.W'],
df['Y'] / df['screen.H'],
gridsize=60, cmap=mpl.cm.jet, norm=mpl.colors.LogNorm())plt.title('Dot values (relative to screen size)')
plt.axis([0, 1, 0, 1])
plt.colorbar().set_label('Frequency')
plt.show()
This visualization is much better. We can now clearly see not only that the points were chosen independently of screen size but also where the points are clustered. Keeping the broader perspective of what each of these columns mean enables a better visualization of the details.
Was this helpful? Have comments or improvements? Let’s continue the discussion below!
