Exploring Geospatial Data with kepler.gl

Shan He
vis.gl
Published in
6 min readAug 26, 2019

Co-authors: Gabriel Durkin, Sina Kashuk

kepler.gl is an advanced geospatial visualization tool open sourced by Uber’s visualization team in 2018 and contributed to the Urban Computing Foundation in early 2019.

Figure 1. Using kepler.gl to visualize San Francisco building footprint

At Uber, kepler.gl is the de facto tool for geospatial data analysis. In a previous article, we introduced kepler.gl for Jupyter Notebook. In this article, we want to showcase how data scientists at Uber use kepler.gl to understand massive amounts of aggregated geospatial data and derive insights that improve our business. All the analysis presented in this blog post is based on data aggregated by H3, Uber’s open source geospatial indexing system, with an aperture equal to 12, for the locations with a minimum of 100 trips counts using at least 6 months of data.

Figure 2. Maps… without maps. — Toronto request data with kepler.gl derived from aggregated rider GPS signals

Uber’s platform leverages digital solutions to tackle transportation problems in the physical world, such as ridesharing and meal delivery. Gabriel Durkin and Sina Kashuk, data scientists from Uber’s Rider Geospatial Intelligence team, leverage kepler.gl to analyze trip data, specifically to understand the real-world challenge of driver-partners and riders locating each other for pick-up in a complex cityscape. Figure 2, above, illustrates how the projection of the pick-up data at high resolution can create a map of the city of Toronto based entirely on usage of the Uber app, without leveraging a single base map.

Figure 3. Visualizing highest concentrations of requests in New York City over a 24 hour period

The pick-up process of an Uber ride or Uber Eats meal is one of Sina and Gabriel’s biggest data science pain points. There are many geospatial challenges associated with the pick-up process, and our teams frequently use kepler.gl to inform the development of geospatial solutions to improve this part of the ridesharing and delivery experiences. A common visualization rendered for this type of problem solving is a time lapse animation, as depicted in Figure 3, above, to identify temporal trends and areas with a higher concentration of trip requests, which may be correlated with suboptimal pick-up experiences.

Figure 4. Success Rate: blue indicates successful (and red defective) pick-ups in San Francisco

Sina and Gabriel also created a ‘success’ metric that identifies places associated with quality pick-up experiences, i.e., spots with a minimum of cancellations, pick-up location errors or other types of defects. These were projected onto city maps as hexagonal spatial units visualized in kepler.gl using the H3 layer (Figure 4), and ingested by our pick-up spot recommendation engine, so that the future suggested pick-up locations are far from ‘low success’ areas.

Figure 5. Creation of Automated pick-up and Dropoff zone based on the historical trip experience — Visualizing areas that have a high fraction of canceled trips — Numbered circles annotate cluster centers so upper and lower maps can be compared

This success metric can equally be mapped to the rider device request location (likely inside a building) rather than the “in the street” vehicle pick-up location. Two complementary maps are created, as depicted in Figure 5, for understanding the pick-up experience: (top) one depicting hexagonal heatmaps of Uber metrics and (lower) one depicting the physical landscape of buildings and city-block polygons for the same locations. The finalized maps, like those in Figure 5, are saved to an interactive HTML file that is shared with regional Uber Operations teams to flag and address complex or problematic pick-up areas. Numerical labels are applied to annotate the densest areas, and geofences are created around them (as depicted by the polygons in the lower plot). When requesting a trip in these ‘enhanced’ pick-up zones, riders will be given additional instructions to guide them through the process.

Another kepler.gl metric Gabriel and Sina assessed when working on ways to improve the Uber pick-up experience was the estimated time of arrival (ETA) error. When users request a ride on the Uber platform, they expect the ETA of their driver to map as accurately as possible to our in-app calculations. ETA errors will result in poor user experiences. The two data scientists created maps to identify areas with higher ETA errors, allowing teams to better understand where the algorithms that produce the estimates are performing well and where they need to improve.

Figure 6. Comparing one week of ETA (left, red is high ETA) and Request volume (right, yellow is high density) in San Francisco

For example, in the very early morning, there seems to be a longer ETA in the Northeast, evidencing sparsity of vehicle supply (Figure 6). The lack of clustering and low density of requests throughout suburban neighborhoods clearly presents a challenge for dispatching new trips at that time of day.

The dual map in Figure 6, above, uses kepler.gl to discover correlations between arrival times and the volume of Uber requests in San Francisco. Both maps show places of highest request volume but the left map is colored by ETA (long ETAs in red, short ETAs in blue), and the right map is colored by request volume (high volume in yellow).

Gabriel and Sina are working with product managers on the Rider Team to build solutions that have spatial and temporal context-awareness. One possibility would be a notification sent to the rider at rush-hour, e.g. “Hey, you are in a busy area. You should request a few minutes in advance to avoid longer delays.” Insights derived via the lens of kepler.gl will drive the design of potential new Uber rider app geo-contextual features.

Figure 7. LEFT — Heterogeneity: Number of distinct riders divided by the number of trips aggregated in each hexagon & MIDDLE — ETA errors: green is low, purple is high & RIGHT- Avg trip distance: Average distance traveled of trips requested: darker green is longer

kepler.gl also provides a lens to bring clarity to the understanding of the movements and travel behaviors of people in cities. Figure 7 shows three different map layers of kepler.gl for Manhattan. The left image projects place-heterogeneity — places with a higher ratio of riders to trips are indicated in yellow (Fig.7 LEFT). The heterogeneity, or public/private metric, usually indicates whether an area is more public and populated by visitors and tourists. For instance, this metric is close to 1.0 at airports because each rider typically takes a single trip from the airport, compared with < 0.6 at residential neighborhoods where the same rider may request multiple trips, e.g., every weekday morning to work. In Figure 7, these regions appear dark red. Challenging pick-ups can result from a rider’s unfamiliarity with a new place, rather than any intrinsic property of the place itself. The heterogeneity metric projected via kepler.gl highlights these cases geospatially.

The middle image of Manhattan represents ETA errors (Fig.7 MIDDLE) and suggests that more isolated locations (near the waters’ edge) lead to greater uncertainty in the estimated time of arrival of the dispatched cars.

By projecting ‘completed trip distance’ onto the geo-location of requests in kepler.gl (Fig.7 RIGHT) additional geographical constraints imposed by the waterways surrounding Manhattan are apparent. For example, trips beginning at its southernmost tip tend to be longer, in part because riders must travel North to get anywhere.

Figure 8. Using kepler.gl in Jupyter Notebook to visualize geospatial data

The Jupyter Notebook platform combined with kepler.gl (Figure 8) allows data scientists like Sina and Gabriel to identify and develop an understanding of the geospatial nature of Uber’s data — that data patterns and clustering is apparent via kepler.gl maps is proof that the ‘prior art’ of collecting data in tables and spreadsheets alone ignores the confounding influence of its spatial context.

Using kepler.gl, Gabriel and Sina were able to gain new insights, consistently increase pick-up quality, and improve rider experience across all the cities and continents where Uber operates.

Now that kepler.gl for Jupyter is open sourced, we are eager to see how it is adopted by other data scientists working in the geospatial domain and learn about inventive new uses cases developed by the community in the coming months.

--

--