A tour through Visualization zoo — Time series and Geospatial Maps

Shubham Goyal
AI Skunks
Published in
9 min readMar 14, 2023

--

This article is an attempt to discuss in detail some of the most useful visualization techniques for Time series and Geospatial Maps discussed in the research paper “A Tour through the Visualization Zoo: A survey of powerful visualization techniques, from the obvious to the obscure”

Topics we would be discussing —

  1. Introduction to Time Series Visualization
  2. Types of charts:
    - Index Charts
    - Stacked graphs
    - Small Multiples
    - Horizon graphs
  3. Maps
    - Flow Maps
    - Choropleth Maps
    - Graduated Symbol Maps

Introduction to Time Series visualization

Time series visualization is a technique used to analyze and understand time series data through visual means. By creating visualizations of time series data, we can identify patterns and trends in the data that might not be immediately apparent from looking at the raw numbers.

Visualizations can take many forms, including line charts, bar charts, area charts, and scatter plots. The choice of visualization type depends on the specific characteristics of the data being analyzed and the analysis goals.

Examples of time series data include stock prices, weather data, and website traffic statistics.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Generate random time series data
dates = pd.date_range('2022-01-01', periods=100, freq='D')
values = np.random.randint(10, 100, size=(100,))

# Create data frame
df = pd.DataFrame({'date': dates, 'value': values})

# Set date column as index
df.set_index('date', inplace=True)

plt.figure(figsize=(12, 6))

# Create line chart of time series data
plt.plot(df.index, df['value'])

# Add title and axis labels
plt.title('Random Time Series Visualization')
plt.xlabel('Date')
plt.ylabel('Value')

# Display chart
plt.show()

Types of Charts

a) Index Charts —

They are used to display changes in a variable over time, with time represented on the x-axis and the variable of interest represented on the y-axis. Index charts are useful for showing trends and patterns in data over time. They are particularly useful for tracking changes in data that occur at regular intervals, such as daily, weekly, or monthly data.

Index charts can be used in a wide range of fields, including finance, economics, and environmental science, to name just a few. In finance, for example, index charts are often used to track changes in stock prices over time.

#Creating a function for indexing

def create_indexed_columns(date, df, top_level_name=""):
"""Returns indexed columns for given dataframe"""

# find index of the date that is closest to our reference date
closest_date_index = df.index.get_loc(date, method="nearest")

# get the values in the initial columns for the reference date
reference_values = df.iloc[closest_date_index]

# divide initial columns by values at ref. date and store in intermediate df
inter_df = df.div(reference_values)*100 - 100

# create a multindex for the intermediate df using the date as top-level index
closest_date = df.index[closest_date_index]
inter_df.columns = pd.MultiIndex.from_product(
[[top_level_name if top_level_name else (closest_date)], inter_df.columns])

return inter_df, closest_date
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Generate random time series data
dates = pd.date_range('2022-01-01', periods=100, freq='D')
values_1 = np.random.randint(10, 100, size=(100,))
values_2 = np.random.randint(20, 200, size=(100,))

# Create data frame
df = pd.DataFrame({'date': dates, 'value_1': values_1, 'value_2': values_2})

# Set date column as index
df.set_index('date', inplace=True)

inter_df,df_index= create_indexed_columns('2022-01-01',df)
plt.figure(figsize=(12, 6))
plt.plot(inter_df)
plt.title('Random Time Series Index Chart')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.show()

b) Stacked Charts —

Time series stacked charts are a type of chart that allows you to visualize changes in the composition of multiple variables over time. Stacked charts use multiple layers of data to represent the total value of a variable at each point in time, and the relative proportion of each layer shows how the variable is made up of different components.

Stacked charts are particularly useful for visualizing changes in the composition of time series data where the total value remains constant, such as the market share of different companies or the percentage of sales by product category. By stacking the data, you can see how the different components contribute to the total over time, and how their relative importance changes

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Generate random time series data
dates = pd.date_range('2022-01-01', periods=100, freq='D')
values_1 = np.random.randint(10, 100, size=(100,))
values_2 = np.random.randint(20, 200, size=(100,))

# Create data frame
df = pd.DataFrame({'date': dates, 'value_1': values_1, 'value_2': values_2})

# Set date column as index
df.set_index('date', inplace=True)

# Create stacked chart of time series data
plt.stackplot(df.index, df['value_1'], df['value_2'], labels=['Value 1', 'Value 2'])

# Add title and axis labels
plt.title('Random Time Series Stacked Chart')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()

# Display chart
plt.show()

c) Small multiples -

Small multiples are a series of small charts or graphs that each show a subset of the data. By breaking the data up into smaller chunks, small multiples allow us to see the patterns and trends in the data more clearly. Small multiples can be used to visualize time series data in a way that is both informative and visually appealing.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Generate random time series data
dates = pd.date_range('2022-01-01', periods=100, freq='D')
values_1 = np.random.randint(10, 100, size=(100,))
values_2 = np.random.randint(20, 200, size=(100,))

# Create data frame
df = pd.DataFrame({'date': dates, 'value_1': values_1, 'value_2': values_2})

# Set date column as index
df.set_index('date', inplace=True)

# Create small multiples of time series data
fig, axs = plt.subplots(nrows=2, ncols=1, figsize=(8, 6))
axs[0].plot(df['value_1'])
axs[0].set(title='Value 1 Time Series', xlabel='Date', ylabel='Value')
axs[1].plot(df['value_2'])
axs[1].set(title='Value 2 Time Series', xlabel='Date', ylabel='Value')

# Add space between subplots
fig.tight_layout()

# Display chart
plt.show()

d) Horizon Graphs -

A time series horizon graph is a type of visualization that displays a time series over a range of time with multiple horizons, or layers, of the data stacked on top of each other. The purpose of this visualization is to allow viewers to compare changes in the data across time, while also showing changes in the distribution of the data.

In a horizon graph, the data is divided into several horizontal bands, each of which represents a portion of the data’s range. Each band is then split into two halves, with the top half representing positive values and the bottom half representing negative values. The bands are then layered on top of each other, with the top bands representing the most recent data and the bottom bands representing the oldest data.

Horizon graphs are useful for visualizing time series data with large numbers of observations over a long period of time, as they can display a lot of information in a compact space. They are particularly effective for showing changes in the distribution of the data, such as shifts in the mean or changes in the variability over time.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Generate random time series data
dates = pd.date_range('2022-01-01', periods=100, freq='D')
values = np.random.randn(100).cumsum()

# Create data frame
df = pd.DataFrame({'date': dates, 'value': values})

# Set date column as index
df.set_index('date', inplace=True)

# Create horizon graph of time series data
fig, ax = plt.subplots(figsize=(8, 4))

# Use Seaborn library to create horizon plot
sns.lineplot(data=df, palette='colorblind', alpha=0.8, linewidth=1.5, ax=ax)

# Set axis labels and title
ax.set(xlabel='Date', ylabel='Value', title='Horizon Graph of Time Series Data')

# Display chart
plt.show()

Geospatial Maps

a) Flow Maps

Flow maps are a type of visualization that represent movement or flows of entities, such as people, goods, or information, between locations. They are useful for visualizing patterns of movement and identifying hotspots or hubs of activity.

Flow maps typically use lines or arrows to represent the flow of entities, with the width of the lines or arrows proportional to the volume of entities being moved. The lines or arrows are usually drawn between geographic locations, such as cities, regions, or countries, with the flow direction indicated by the direction of the lines or arrows.

import pandas as pd
import numpy as np
import folium

# Generate random data
n_flows = 50
origins = np.random.normal(loc=[34.05, -118.24], scale=[0.2, 0.2], size=(n_flows, 2))
destinations = np.random.normal(loc=[34.05, -118.24], scale=[0.2, 0.2], size=(n_flows, 2))
volumes = np.random.uniform(low=0, high=100, size=n_flows)

# Create DataFrame
flows = pd.DataFrame({'origin_lat': origins[:,0],
'origin_lon': origins[:,1],
'dest_lat': destinations[:,0],
'dest_lon': destinations[:,1],
'volume': volumes})

# Create map
m = folium.Map(location=[34.05, -118.24], zoom_start=10, tiles='Stamen Toner')

# Add flows to map
for index, row in flows.iterrows():
origin = (row['origin_lat'], row['origin_lon'])
dest = (row['dest_lat'], row['dest_lon'])
weight = row['volume']
folium.PolyLine(locations=[origin, dest], weight=weight/10, color='red', opacity=0.5).add_to(m)

# Display map
m
Lines representing geospatial data

b) Choropleth Maps

Choropleth maps are a type of map that use color or shading to represent the variation of a particular variable across geographic regions. The name “choropleth” comes from the Greek words “choros” (region) and “plethos” (multitude), indicating that the map shows a multitude of regions.

Choropleth maps are particularly useful for displaying data that is aggregated at a regional level, such as population density, income, or voting patterns. By using color or shading to represent the variation of the data, choropleth maps can quickly convey patterns and trends that might be difficult to see in a table or a chart

import geopandas as gpd
import matplotlib.pyplot as plt

# Load the map data
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

# Load the data to be displayed
data = {'country': ['France', 'Germany', 'Spain', 'Italy'],
'value': [10, 20, 30, 40]}

# Convert the data to a Pandas DataFrame
df = pd.DataFrame(data)

# Join the map data and the data to be displayed
merged = world.merge(df, left_on='name', right_on='country')

# Create the choropleth map
fig, ax = plt.subplots(figsize=(10, 6))
merged.plot(column='value', cmap='Blues', ax=ax, legend=True)

# Add a title
ax.set_title('Example Choropleth Map')

# Remove the axis
ax.axis('off')

# Show the plot
plt.show()

c) Graduated Symbol Maps

Graduated symbol maps are a type of thematic map that display quantitative data for different geographic locations using proportional symbols. The size of each symbol on the map represents the value of the variable being visualized. The symbols are typically scaled to be proportional to the square root or cube root of the data value, to ensure that the differences in size between the symbols are visually meaningful.

import folium
import numpy as np
import pandas as pd

# Generate random data
n_points = 50
data = pd.DataFrame({
'latitude': np.random.uniform(low=35, high=45, size=n_points),
'longitude': np.random.uniform(low=-120, high=-80, size=n_points),
'value': np.random.randint(low=1, high=1000, size=n_points)
})

# Create a map centered on the United States
m = folium.Map(location=[39.50, -98.35], zoom_start=4)

# Add circles to the map for each data point
for i, row in data.iterrows():
folium.Circle(
location=[row['latitude'], row['longitude']],
radius=row['value'] ** 0.5 * 1000,
color='blue',
fill=True,
fill_color='blue',
fill_opacity=0.5,
tooltip='Location {}: {}'.format(i, row['value'])
).add_to(m)

# Display the map
m

With this we come to an end of this article, if you wanna try on the code please feel free to try the below mentioned colab notebook.

Link to colab —
1) Detailed overview of above theory

2) Worked example for time series visualizations with financial dataset —

If you liked the article please give a clap! Thanks for your time :)

--

--