Spatio-Temporal Data Visualization: My Top 3 techniques by experience.

Samvardhan Vishnoi
5 min readOct 8, 2023

--

We have all been there. We want to visualize our data, but our data is a long temporal list, for multiple places. Plotting them all on the same graph can look, to put to mildly, messy. On the other hand, we have the option of a separate temporal plot for each place, which can range from a long list of anywhere between ~50 (one for each US state) to ~195 (for each country) plots. A hard challenge to gain or communicate insights from down the lane.

After finding myself in this position during multiple projects, I’m happy to share the top 3 techniques that have helped me tackle this problem successfully!

1. Pivot + Heatmap: A powerful combo for discrete behaviors!

Let’s assume you have the possibility of a powerful simplification here: you don’t necessarily need the precise value of your measurable at each time point. You just want to know (and communicate) if its medium, high, or low, or any other possible set of discretely definable ranges (medium-high, medium-low, etc.).

If this is the case, then a pivot table followed by a seaborn heatmap can be ALL you need to put everything neatly in one single plot. Let’s see how.

Assuming, like me, you are working with a pandas dataframe, you can use the pivot function to get a table of values for each place at each time.

#here is a simple example where I pivot my State-Year level 
#data on Arrest Reporting.


pivot = df.pivot('STNAME','YEAR','REPORT%')

Just make sure your dataset is formatted in place*time i.e. the number of rows match the (# of places) * (# of time points per place), so you can put it in rectangular format. Most times, this will mean a simple groupby on [Place, Time] with your measurable aggregated according to desire (commonly mean, median, sum etc.).

Next, you can directly heatmap this pivot table using seaborn, which gives us:


import seaborn as sns
from matplotlib import pyplot as plt

fig, ax = plt.subplots(figsize = (15,10))
ax = sns.heatmap(pivot, cmap = 'RdYlGn',cbar_kws={'label':'ArrestReport%',"boundaries": np.linspace(0, 100, 4)}, linewidth = 0.4,linecolor='white')

"""
the colorbar here is formatted to 3 levels:
High (green), Medium (yellow), and Red (low)

Note: you can choose from a range of discrete colormaps based on your need,
or even define your own!
"""

Instead of a jumble of line plots, we now have discretely identifiable behaviors for ~50 states for ~35 years, all in one quick sight. Beautiful!

2. An Animated Scatterplot: A must for any exploration toolkit.

Lets face it; we are not always gonna be lucky enough to only care about high or low values. Often times, we need the precise relationship between two variables, with a fitted line and all (phew). This can get hectic quick, especially during exploration, where you don’t know this relationship, and need to repeat every line-fit you can imagine for every place- to be really sure its universal!

A Plotly animated scatterplot can be your refuge here. Simply plot the two variables (Measurable vs. Place or Time) against each other, and pass your third remaining Place or Time variable name to the animation-frame parameter of Plotly scatter.

"""
A simple example where I plot Arrests vs Place-Population on a log scale,
revealing a linear relationship, with Time as a slider variable.
"""
import plotly.express as px
px.scatter(df, x = 'POP', y = 'ARRESTS', animation_frame = 'YEAR', trendline = 'ols',trendline_color_override= 'red')

Viola! Now, instead of repeating this process for every year, I have a slider for my Time variable, allowing me to quickly explore the temporal validity of my spatial trend, or vice versa. Here I explored a linear trend (guess what, I sort of knew already) but you can explore many common relations through the trendline parameter. Furthermore, you can even map your marker symbols or color to any another column in your dataframe, adding categorial explorations to the mix. I highly recommend going through the plotly scatter documentation, and adapting to your visualization needs.

"""
Since not every place in my working example has good values (Coverage)
due to bad Arrests reporting, I added this information on value confidence using the color parameter
"""

px.scatter(df, x = 'POP', y = 'ARRESTS', animation_frame = 'YEAR', trendline = 'ols', color = 'COVERAGE', color_continuous_scale=['red','lightgreen','green'])

This helps me tremendously! For year 1988 I can clearly see the bad linear fit is driven by places with bad Arrest reporting (in red), furthering my confidence in my chosen log-linear fit. On the less visual side, Plotly also allows you to extract your fit’s parameter values for every year, allowing deeper dives into goodness-of-fit.

3. An Animated Geoplot: A fun but powerful visualization tool.

Last but not the least, we have the animated Geoplot. When working with Geographical data, using the place label alone hides the full picture. Often, location matters, and places next to each other can be heavily correlated in behavior. Exploring spatial correlation models involves a heavy dive into understanding how exactly this relationship looks. The models are complicated, and can vary in space, change over time etc. So, we never expect some beautiful physics like the inverse distance gravitational law (sigh). Being able to visualize this on a map, however, can be a powerful precursor to choosing the right model. Let’s see how.

Suppose I wanted to explore the spread of a virus worldwide. Common sense indicates that I would definitely want to explore spatial correlations here. Luckily, we can again use the plotly package here to make an animated choropleth map.


import plotly.express as px

fig = px.choropleth(dfg, color = np.log(dfg['Monthly Deaths']),locations = 'Code', animation_frame = 'YM',color_continuous_scale = custom_color_scale)
fig.show()

"""
The gif here shows the monthly spread of the COVID cases
from 2020-2023 on a log scale
Note: a log transform makes the data less skewed, and hence colors are
more uniformly distributed for interpretibility.
"""

Note that in the figure I combine many slider time-points to make a GIF (which I usually embed in my powerpoints!). The visual clearly reveals how quickly the virus spreads across the world, and resurfaces in cycles for countries. Albeit, we have to remember physical boundaries are less relevant in the age of modern air travel. Hence, any sophisticated spatial model here would have to incorporate worldwide flight data, hinting at network-based temporal approaches.

To conclude, I would add that each visualization project will present specific needs and challenges. No one spatio-temporal visualiation toolkit can be all encompassing, but the one I share today has been my personal bread & butter withstanding the test of time. Hopefully, it can be a powerful addition to your own, and please don’t forget to mention your own favorite in the comments!

--

--