The Misconstruction of Feature Heatmaps

7 min readNov 4, 2022

Drawing a heatmap, from an ensemble of points, might sound easy and to a great extend it is. The basic method consists in generating a 2D matrices and adding a ‘1’ wherever there has been an observation. Depending on the level of smoothness you desire, you then convolve a smoothing kernel, such as a Gaussian curve et voilà! But let’s see what happens when we try to draw a feature heatmap, instead of a position one.

Code Sample

A code sample is joined at the bottom of this article for those who want to get a full grasp of the process.

The Basics

Not everyone might have experience with heatmaps. Let me run quickly through the basics (if you are already familiar with the process, you can go ahead and skip to the next section).

Heatmaps are generally used to represent a density of some sort, using a color code. Heatmaps are particularly useful, when overlayed on top of spatial contextualization.

**Figure 1. The heatmap of positions of participants to the** **Grand Salon de l’Homme 2021**

The heatmap above shows how much time participants spend at each position of the plan. The steps to compute such a map are:

Have a sample of the positions of each participant on the map (position data is inherently noisy).
Map it on a grid of appropriate resolution
Add each singular observations to the map (+1 for each observation)
Normalize the scale for display
Smooth for the look

The values can then be converted to pixel color values (RGBA), using a colormap and overlayed on top of a reference map. (I won’t go into the details of this very interesting step in this article, but I plan on revisiting it).

The problem

When it comes to position heatmaps the process is straight forward and the result is accurate. But if you are trying to triangulate a third variable (aka feature) in space, the average engagement or arousal at each position, for example, a problem arises.

First of all, if you only sum up the feature for each observation, the same way we did for position density, the result will be heavily biased by the position density itself.

To illustrate this, we have designed a toy problem to work with. For sake of simplicity and clarity, the problem is developed using a 1D array, but the same observation and conclusion is valid for a 2D heatmap.

Problem Parameters

The parameters have been selected to highlight the issue on hand. Specifically, the distribution (2) has a low feature value, but was heavily sampled, this is the one for which the behavior will be the most misleading. Distribution (1) has a higher feature value than (2), but has been less sampled and distribution (3) has the highest feature value and is sampled on the same scale as (1).

Table 1. Parameters of the problem. Each distribution is a noisy position, with an average feature value. Distribution (1) doesn’t have anything in particular, while distribution (2) has a particularly large number of observation and distribution (3) has a particularly large feature value.

Figure 2. As we notice from a basic summation, none of the distribution land on their expected value, further the relative amplitude of the features is not reported accurately either. This approach is a bit idiotic, I don’t think many will use this approach to produce a heatmap, but it sets the stage.

Intuitively, computing the average value (equation 1) at each location of the grid (steps 1,2,3,4 in previous section) is a good way to move pass this predicament and it does indeed give you the correct average at each sampled location (Fig. 3).

**Figure 3. In this figure, the averages appear correct. The graph is a bit sharp, but the values are correct. Is the problem solved?**

All grid values represent the intended value (step 4), the average of the variable. But the data is quite sharp, it usually better to even it up using a smoothing filter (step 5) and boom, here comes the problem again (Fig. 4)… Distribution (2), which has a lower feature value, now appear to be on par with distribution (1), which is an incorrect representation.

**Figure 4. In this figure, the numbers are far from the expected value and again distribution (2) appear like it has a higher average than distribution (1), which is incorrect…**

The problem comes from the statistical nature of the sampling. While computing the single position average gave us the correct number, distribution (2) still has more singular values on the map as samples are randomly sprinkled around the average position. When the smoothing kernel is convolved, these singular values build up and messes up the combined value.

The solution

The solution consists in computing the average earlier in the process, as to take into account the noisy nature of the representation.

Sum up the number of positions and feature values independently.
Smooth each grid independently, using the same kernel
Divide sum of feature by number of samples (sum(n[A])/N)
Normalize the scale for display
Smooth for better look

**Figure 5. For all distribution, we find the correct values and the map is smooth (maybe a bit too smooth in this specific example, but that’s open for debate as a function of the desired outcome).**

Now, we find the appropriate value, on a smoothed graph. This process still suffers from some issues:

Smoothing twice, will reduce the precision of the map. It is possible to work with the kernel parameters to achieve the best compromise.
The spurious effect can still show up as the kernel parameters need to be adapted to the noisy position uncertainty parameter.

In conclusion

I don’t know how much of this is a real problem for data scientists. I know that it has puzzled me for some time; I also know that Google API doesn’t provide an intrinsic mechanism to deal with this; And finally, I didn’t find a simple explanation on how to plot feature heatmap on the web, so I hope this guide helps you toward producing more accurate heatmaps!

"""This script demonstrate a trap of feature heatmap. It has been written
along side the medium article (The Misconstruction of Heatmaps).In summary, it shows that noisy position density can lead to a bad estimation
of the feature value at that location. The issue is amplified with the degree
of unbalance of the position distribution.The script first illustrate the problem (demo 1), then show an intuitive (demo 2a), but 
flawed solution (demo 2b) and finally the solution (demo 3).This work is published as open material, with no specific licensing. Feel free to share, 
use and/or modify this code as you please. I'd appreciate if you could pass on the copyright.Author: Frédéric Simard, RE-AK 2022 (fs@re-ak.com)"""import os, sys
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import font_manager as fmSAVE_FIGURES = True# formatting function, please ignore
def setOneGraphStyle(plt):
    fpath = 'Quicksand-Regular.ttf'
    quicksandRegularFontProp = fm.FontProperties(fname=fpath)
    plt.setp(plt.gca().get_xticklabels(), fontproperties=quicksandRegularFontProp)
    plt.setp(plt.gca().get_yticklabels(), fontproperties=quicksandRegularFontProp)
    plt.gca().set_xlabel(plt.gca().get_xlabel(), fontproperties=quicksandRegularFontProp)
    plt.gca().set_ylabel(plt.gca().get_ylabel(), fontproperties=quicksandRegularFontProp)
    plt.gca().set_title(plt.gca().get_title(), fontproperties=quicksandRegularFontProp)
    plt.gca().title.set_fontsize(16)
    plt.gca().spines['right'].set_visible(False)
    plt.gca().spines['top'].set_visible(False)
    plt.gcf().set_size_inches(8, 6)# Setup of the problem, we have three distinct positions, apart enough that
# they don't influence one another
coordinates = [500, 1000, 1900]# Our "participants" spent some time at each position
# the amount of time spent is voluntarily disproportionate in
# one position, due to lets say, the participant waiting in line.
timeSpent = [10, 150, 30]# In the context of this experiment, we are sampling one hypothetical values
# notice how it is minimal at the position where people spent a lot of time
# again, one value is voluntarily more important than the others.
FEATURE_STD_DEV = 1.0
featureAvgValues = [70, 20, 150]# we are defining a gaussian kernel to compute local averaging
sigma = 60
gx = np.arange(-3*sigma, 3*sigma, 1)
gaussian = np.exp(-(gx/sigma)**2/2)
gaussian /= np.sum(gaussian)# those are the grid used to build the heatmaps, 1D array are used
# so that its easier to see, but the results would be the same over a 2D array
xs = np.arange(3000)
featureHeatmap = np.zeros((3000,));
positionHeatmap = np.zeros((3000,));# for all coordinates
for idx, position in enumerate(coordinates):
    # we add up the values recorded
    for i in range(timeSpent[idx]):
    
        # add some noise to position
        pos = int(position + np.random.normal(0,10))
        
        # add 1 to position heatmap
        positionHeatmap[pos] += 1
        
        # add some noise to feature value 
        # (we use absolute to prevent negative values, but given the parameters, it shouldn't matter much)
        featureHeatmap[pos] += np.abs(np.random.normal(featureAvgValues[idx], FEATURE_STD_DEV))# Demo 1 - absolute value are obviously wrong, if we spend more time at a location and add 
# all feature values, we get a sum that doesn't even remotely represent the real feature 
# parameter, especially when the position density is unbalanced
plt.plot(featureHeatmap)
#plt.fill_between(x=xs, y1=featureHeatmap, where= (-1 < xs)&(xs < 1),color= "b")#,alpha= 0.2
plt.title("Demo 1 - The nature of the problem")
plt.xlabel("Position")
plt.ylabel("Feature Value")
setOneGraphStyle(plt)
plt.savefig("Demo1.png")
plt.show()# Demo 2a - Dividing by the position heatmap is an intuitive solution and before we smooth the data
# is apppear like a valid solution
featureHeatmap2a = np.divide(featureHeatmap, positionHeatmap, out=np.zeros_like(featureHeatmap), where=positionHeatmap!=0)
plt.plot(featureHeatmap2a)
plt.title("Demo 2a - An Intuitive Solution...")
plt.xlabel("Position")
plt.ylabel("Feature Value")
setOneGraphStyle(plt)
plt.savefig("Demo2a.png")
plt.show()# Demo 2b - but due to the probabilisitic nature of the sampling, smoothing the data after average result in incorrect results...
# that's probably the least intuitive demonstration and a trap many are at risk of falling in
featureHeatmap2b = np.convolve(featureHeatmap2a, gaussian, mode="same")
plt.plot(featureHeatmap2b)
plt.title("Demo 2b - ...And Yet a Flawed One")
plt.xlabel("Position")
plt.ylabel("Feature Value")
setOneGraphStyle(plt)
plt.savefig("Demo2b.png")
plt.show()# Demo 3 - the solution is to smooth both feature and position densities, before dividing them
featureHeatmap3a = np.convolve(featureHeatmap, gaussian, mode="same")
positionHeatmap3a = np.convolve(positionHeatmap, gaussian, mode="same")
featureHeatmap3b = np.divide(featureHeatmap3a, positionHeatmap3a, out=np.zeros_like(featureHeatmap3a), where=positionHeatmap3a!=0)# then you can smooth again to remove sharp edges
featureHeatmap3b = np.convolve(featureHeatmap3b, gaussian, mode="full")# and now we find a correct representation of the data
plt.plot(featureHeatmap3b)
plt.title("Demo 3 - The Correct Method")
plt.xlabel("Position")
plt.ylabel("Feature Value")
setOneGraphStyle(plt)
plt.savefig("Demo3.png")
plt.show()# With this, keep in mind the following:
# * the proper smoothing values, depend heavily on the distribution of position uncertainty (noise)
# * akin to the uncertainty principle in particle physics, if you increase the smoothing value, you are more
# * correct about the feature value, but less about the position. Decreasing smoothing value has the opposite effect.
#
# If your position density doesn't suffer from large biases (very rare), you don't need to worry much, but whenever
# you have a high position density at some location, your feature heatmap is bound to be wrong, unless you apply the correct procedure.

The Misconstruction of Feature Heatmaps

Written by Fred Simard - CEO @ RE-AK