Visualising large spatiotemporal data in web applications

7 min readJul 6, 2018

TL;DR: Mapbox has recently released v0.46.0 of it’s GL JS API. This is a tutorial on how to use this technology to visualize large spatiotemporal datasets. We’ll show the change of NYC Citibike trips over time as an example of Mapbox GL JS’s capabilities.

Demo: here

Source code: here

A lot of objects and events that can be visualized on a map are dynamic in space and time. Some examples of Urbica’s projects that used spatiotemporal data are the map of GULAG History and NYC bike share system rebalancing vizualisation.

When designing a cartographic application with a lot of data you should always keep in mind the performance of your application on the client side as it heavily affects user experience. Nobody likes slow and laggy maps.

Mapbox team made a very efficient and high-performant JavaScript library for interactive, customizable vector web maps. It takes map styles, applies them to vector tiles, and renders them using WebGL.

Problems come when you want to display spatial data that changes over time. And there are several ways to solve the problem.

Slicing time values into multiple attributes

First, you can add more attributes that represent the value changing over time to a vector tile. E.g. you have an attribute rides_per_hour that distributes over 24 hours. In that case you can split the values into attributes like rides_00, rides_01, rides_02, …, rides_23. Having all of the time slices in a tile will allow you to style a layer according to what a user selected in UI.

The problem is that your tile will grow when adding each new time slice and sooner or later you will meet the tile size constraint. So this solution works only for a small dataset.

Requesting tiles for every time slice

Another option is to request a tile for every time slice. Each time slice selection in UI is a request for a new tile layer that holds only the attributes needed.

The issue here is the requirement to duplicate geometry as many times as you make requests which could seriously bloat the tile cache.

Building GeoJSON layers on time slice change

Another option is to load geometry once and request only time slice data. Having both geometry and time slice data you can build GeoJSON layers on the client and update existing layers using setData method.

In this case the problem is that you have to build a layer and update data on the map on every user request. It may affect performance, and keep in mind that everyone hates slow maps.

Using Feature State

Recently, Mapbox released a new version of Mapbox GL JS — v0.46.0. In this release they introduced Map#setFeatureState method and feature-state expression to support interactive styling.

John Firebaugh presenting feature state on Mapbox’s Locate 2018 conference

A feature’s state is not a part of the GeoJSON or vector tile data, it is just a plain JavaScript Object that can be set programmatically in runtime on each feature using unique feature ID.

feature-state expression retrieves a property value from the current feature’s state. It returns null if the requested property is not present on the feature’s state. Note that feature-state can only be used with paint properties that support data-driven styling.

This approach allows us to solve problems mentioned above:

You can have as many time slices as you want;
You don’t have to duplicate geometry, because you can only request time slices;
You don’t have to build a new GeoJSON layer every time, because you can simply update features’ states that already exist on your map.

Visualizing NYC Citibike trip data using Mapbox feature state

Demo: here

Source code: here

First, we need to collect and process data. Citibike provides trip histories data according to the NYCBS Data Use Policy. You can navigate to trip data bucket and choose date range needed. We will be using trips committed in May 2018.

curl https://s3.amazonaws.com/tripdata/201805-citibike-tripdata.csv.zip -o ./tripdata.zipunzip ./tripdata.zip

We are using PostgreSQL as a database, so we need to create trips table and insert trips into it.

-- create trips table
create table trips (
  "tripduration" numeric,
  "starttime" timestamp,
  "stoptime" timestamp,
  "start station id" varchar,
  "start station name" varchar,
  "start station latitude" numeric,
  "start station longitude" numeric,
  "end station id" varchar,
  "end station name" varchar,
  "end station latitude" numeric,
  "end station longitude" numeric,
  "bikeid" varchar,
  "name_localizedValue0" varchar,
  "usertype" varchar,
  "birth year" varchar,
  "gender" varchar
);-- copy trips data to trips table
copy trips from '/path-to-your-data/201805-citibike-tripdata.csv' delimiter ',' csv header;

Now we need to create trips_count table to store trip counts aggregated by station ID and timestamp.

 — create table trips_count
create table trips_count (
  station_id varchar,
  trips_count integer,
  ts timestamp
);

We also recommend using TimescaleDB for easier time-series manipulation. This will allow us to create TimescaleDB hypertable that will partition trip data based on timestamp.

 — create timescaledb hypertable
select create_hypertable(‘trips_count’, ‘ts’);

Now we can aggregate out trip data by summing up all the trips that ended on a station in one hour.

 — aggregate trips into trips_count tableinsert into trips_count select
  “end station id” as station_id,
  count(date_trunc(‘hour’, stoptime)) as trips_count,
  date_trunc(‘hour’, stoptime) as ts
from trips
group by
  station_id, date_trunc(‘hour’, stoptime);

Having trips_count table we can use it to query time slices. We are using PostgREST — a RESTful API server for a PostgreSQL database. PostgREST allows us to query data with HTTP GET and filter out values based on GET params.

GET /trips_count?ts=eq.2018–05–06%2001:00:00[{station_id: “3100”, trips_count: 3, ts: “2018–05–06T01:00:00”}, …]

Now when we are all set, let’s create our map! First, we need to add mapbox-gl-js.

<script src=’https://api.tiles.mapbox.com/mapbox-gl-js/v0.46.0/mapbox-gl.js'></script><link href=’https://api.tiles.mapbox.com/mapbox-gl-js/v0.46.0/mapbox-gl.css' rel=’stylesheet’ />

To use any of Mapbox’s tools, APIs, or SDKs you need a Mapbox access token. Mapbox uses access tokens to associate requests to API resources with your account. You can find all of your access tokens, create new ones, or delete existing ones on your API access tokens page.

mapboxgl.accessToken = ‘your access token here’;

Now we can initialize an empty map.

const map = new mapboxgl.Map({
  container: ‘map’,
  zoom: 12,
  center: [-73.9774, 40.7391],
  style: ‘mapbox://styles/mapbox/dark-v9’,
  hash: true
});

Citibike provides an API with list of all available stations at https://layer.bicyclesharing.net/map/v1/nyc/stations. We can query it to get stations’ geometry and create a map data source. NB: to know on which feature you want to set state, you have to specify features’ IDs.

let stationsGeoJSON;
fetch(‘https://layer.bicyclesharing.net/map/v1/nyc/stations')
  .then(r => r.json())
  .then((stations) => {
    stations.features.forEach((station) => {
      station.id = station.properties.station_id;
    });
    
    stationsGeoJSON = stations;
    
    map.addSource(‘stations’, {
      ‘type’: ‘geojson’,
      ‘data’: stations
    });
  });

Now, we can add a layer that will use stations source.

map.addLayer({
  “id”: “stations”,
  “type”: “circle”,
  “source”: ‘stations’,
  “paint”: {
    “circle-radius”: [
      “interpolate”, [“linear”], [“zoom”],
      11, 1,
      22, 4
    ],
    “circle-stroke-color”: “#fff”,
    “circle-stroke-width”: 1,
    “circle-stroke-opacity”: 0.2
  }
});

Ta-da, here are Citibike stations on the map:

Now we can query for trip counts using PostgREST API and set trip data as station features state values.

fetch(`/api/trips_count?ts=eq.${encodeURI(ts)}`)
  .then(r => r.json())
  .then(data => data.reduce((acc, d) => ({ [d.station_id]: d, ...acc })))
  .then((data) => {
    stationsGeoJSON.features.forEach(({ id }) => {
      const datum = data[id];
      if (!datum) {
        map.setFeatureState({ id, source: 'stations' }, {});
      } else {
      const state = { trips_count: datum.trips_count };
      map.setFeatureState({ id, source: 'stations' }, state);
      }
    });
});

This code will query for trips count for given time slice, group them by station ID and set feature states according to these IDs using setFeatureState method.

Let’s update our layer to paint station features with different colors based on their trips_count state. Feature state expression retrieves a property value from the current feature’s state. It returns null if the requested property is not present on the feature’s state.

map.addLayer({
  “id”: “stations”,
  “type”: “circle”,
  “source”: ‘stations’,
  “paint”: {
    "circle-opacity": 1,
    “circle-color”: [
      “step”, 
      [“feature-state”, “trips_count”],
      “#429EFF”, 2,
      “#80FF9E”, 5,
      “#F0FF3D”, 10,
      “#FF94A6”, 20,
      “#C399FF”
    ],
    “circle-radius”: [
      “interpolate”, [“linear”], [“zoom”],
      11, 1,
      22, 4
    ],
    “circle-stroke-color”: “#ffffFF”,
    “circle-stroke-width”: 1,
    “circle-stroke-opacity”: 0.2
  }
});

And that’s it! Now we have our stations painted according to trips count state. Play with live demo at citibike-nyc.urbica.co or build your own app.

Stepan Kuzmin, Urbica CTO
¯\_(ツ)_/¯

About Urbica: By combining our experience in data analysis and user interface design, we help develop services and products that solve the business challenges of our customers.