Multi-dimensional Spatiotemporal Data Exploration with TPFlow

yichong
VisUMD
Published in
4 min readDec 13, 2019

Or, how I became an excellent spatiotemporal hunter with minimal effort.

The TPFlow user interface.

Consider a wide and boundless ocean, with millions of fishes swimming with the current. Imagine you are asked to analyze traces of all fishes and find the most specific one. This task essentially amounts to analyzing a multi-dimensional spatiotemporal dataset where each value is defined by the corresponding temporal, spatial, and other domain-specific dimensions. Only the most excellent hunter and the most experienced data analyst can overcome the obstacles and barriers to accomplish this task.

Fortunately, your way to become an excellent spatiotemporal hunter would be easier with the TPFlow system developed by Dongyu Liu, Panpan Xu, and Liu Ren. Their 2018 paper “TPFlow: Progressive partition and multidimensional pattern extraction for large-scale spatio-temporal data analysis” introduces a novel algorithm that seeks for the best way to slice multi-dimensional ST (short name for Spatio-Temporal) data, and a novel interactive system that facilitates people to discover, compare, and verify data.

Steps That You Don’t Need to Do

Modelling traffic flow data as a three-dimensional tensor.

The first step is to model the ST data as tensors. A tensor is a multi-dimensional array where each variable in the array represent a kind of ST data. For example, we can use a small cuboid to represent one traffic flow data. Length refers to hour, width refers to location, and height refers to day. So an i×j×k cuboid represents the traffic volume at location i on the j-th day during hour k. And tiles of cuboids build up a multi-dimensional traffic flow dataset.

Automatic tensor decomposition.

The second step is to group similar tensors together. For example, the traffic-volume may have different geospatial distribution on weekdays and weekends. However, in the original cuboid tile, cuboids refer to Saturday are far away from cuboids refer to Sunday. Therefore, this step helps to find the similarity within the tensors, then group similar tensors together, which equals to move cuboids refer to Saturday next to cuboids refer to Sunday.

These steps don’t sound too hard for you, right? However, when there are thousands of cuboids in front of you, things quickly go downhill.

The good news is that you don’t have to go through those steps anymore, because the algorithm proposed by Liu and team will do that for you.

How Does TPFlow Work?

The TPFlow user interface.

TPFlow is an interactive system that enables you to perform data partitioning directly. As shown in the picture, when hovering on a tree node (a), a menu will pop up displaying different options. The options include the dimension to perform partition on, the number of clusters to create and the clustering algorithm to use. The system will perform partitioning automatically for you.

On each node, TPFlow uses a diverging color scheme from blue to red to visualize the discrepancies. For example, blue suggests a low discrepancy while red indicates high discrepancy. According to these color diverges, you could get an insight into the faithfulness of the patterns for each partition quickly, and refine the partition if necessary.

Additionally, TPFlow employs several basic chart types to display the latent trends/distributions on spatial (geographical), temporal, categorical and numerical dimensions. Details are given in the graph below.

Patterns in different dimensions.

Examples

Liu and team demonstrate three usage scenarios with real-world data:

  1. Regional sales data analysis;
  2. Customer in-store traffic data analysis for brick-and-mortar retailers; and
  3. The New York taxi trip OD data.

You can see more details of these experiments in the video below.

TPFlow: Progressive Partition and Multidimensional Pattern Extraction for Spatio-Temporal Data

Compare to other algorithms, their algorithm consistently has a smaller cost when partitioning on the months and products dimension, which proves that their algorithm can find more reasonable ways to partition the tensor and represent the patterns more faithfully.

This article is based on the following paper:

  • Liu, D., Xu, P., & Ren, L. (2018). TPFlow: Progressive partition and multidimensional pattern extraction for large-scale spatio-temporal data analysis. IEEE Transactions on Visualization and Computer Graphics, 25(1), 1–11.

--

--