Member-only story
Playing With Uber’s Hexagonal Hierarchical Spatial Index, H3
NYC taxi data visualization
H3 is a geospatial indexing system developed and open sourced by Uber. It provides functions for converting latitude/longitude coordinates into H3 geospatial hexagonal tiles.
It is entirely written in C but there are bindings available for other languages including Python, R, and Ruby.
As a transportation engineer, we do use zones and zoning in our analyses a lot. However, manually drawn partitions of maps have limitations, such as the center of a zone not representing the center of the data points, unequal number of neighbors and unequal distances to neighbors, and some other undesirable boundary effects.
There are many good advantages of using hexagons for spatial analyses and more of the use cases can be read here.
I used H3 to do some preliminary analysis on the taxi data which was made publicly available by the Taxi Limousine Commission (TLC) in New York City (NYC).
I would like to thank everyone who shared their experiences with H3 and created tutorials. They were not many but the ones that are available are really useful.
I downloaded taxi data from April 2016 because that is the year that taxi data had geo-locations of taxi pick-ups rather than the pick-up location IDs.
The new data-sets (post 2016) released by the TLC only have pick-up location IDs that can be mapped to a taxi zone. Here is the web page if you also want to download it.
Since the size of the CSV file is big (1.74 GB), some people may receive a “Memory Error” while using the pandas library. Therefore, I tried to chop it down a little bit by getting rid of the columns we don’t need and also changing the data types.
The code above should run about three iterations. Now we have the df_list
which contains the batch data frames. Let’s read all of them into one.