Low-code spatial analysis based on H3 with CARTO and Snowflake

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

8 min readMar 4, 2024

In our previous post, “Simplify Spatial Indexing with the Power of H3 — What the World Needs Now Is a Hexagonal Grid”, we introduced basic concepts and use cases around H3. It’s a hierarchical, multi-resolution discrete global grid system that enables lightning-fast spatial analysis at scale. That post shared how you can start with H3, including creating an H3 index, seamlessly leveraging unique Spatial Index properties (such as parent, child, and neighbor cells), and enriching tables with H3 cells with data from tables that store other spatial objects. In today’s post, we’ll explore three use cases to illustrate how you can embed H3 into your analytical pipelines to help you scale your analysis with the help of our Location Intelligence partner CARTO.

One of the great advantages of leveraging H3 in Snowflake with CARTO is that you can build advanced analytical pipelines in CARTO Workflows without the need to write code. We will also show how to create interactive dashboards with CARTO Builder with built-in capabilities to visualize the H3 grid on a map. CARTO Workflows allows you to build your transformations by dragging and dropping data and analytical components onto a canvas — with all the processing still happening in Snowflake! Let’s check it out in action.

Use Case #1 — Understanding the spatial distribution of taxi trips in NYC

H3 offers several advantages for spatial analysis, including its capability to visualize complex spatial patterns straightforwardly and understandably. The following example shows how helpful H3 can be in this regard. We’ll use NYC taxi pick-up and drop-off data (the NYC_TAXI_RIDES in the CARTO Academy Data listing), transforming 2.5 million taxi location points into one H3 frequency grid. This will allow us to compare the spatial distribution of pick-up and drop-off locations. This type of analysis is important for resource planning in industries where the geographical locations of origins and destinations differ.

Note that you can open the SQL preview panel at the bottom of your workflow to see what’s running “under the hood” in your Snowflake project. You can even copy the entire SQL code into a Snowflake worksheet (see below) and run it from there — or download the workflow code as an SQL file to deploy in your data warehouse, saving you from writing any code!

The steps to this workflow are:

Filter taxi trips to the morning only. As we can expect very different pick-up and drop-off patterns depending on the time of day, we’ll use a simple filter (PART_OF_DAY = Morning) to investigate this time specifically.
Convert pick-up and drop-off locations to a H3 count grid. The NYC_TAXI_RIDES table contains a string field for both the pick-up and drop-off locations, so for each of these, we need to:

Convert the pick-up and drop-off geographies into H3 indexes (H3 GeogPoint).
Use a GROUP BY component to calculate the total trips starting and finishing per H3 cell.

3. Join inputs together. In this stage, we join two H3 count grids based on the H3 index field.

4. Normalize and calculate the difference between pick-ups and drop-offs. In this section, we use two Normalize components to convert the counts of both the pick-up and drop-offs into a score from 0 to 1. Then, we use a Create Column to find the difference between these two normalized scores to determine if an area has relatively more pickups or drop-offs. Finally — we Save as Table to commit the results, which you can see visualized in the map below.

Use this link to open an interactive map

Do you notice any patterns here? We can see more drop-offs in the business district of Midtown — particularly along Park Avenue — and more pick-ups in the more residential areas, such as the Upper East and West Side, clearly reflecting the morning commute!

Follow the full tutorial for this use case here.

Use Case #2 — Using K-Rings to calculate areas of influence

Imagine you are an analyst working for a telco company. And your company wants to roll out cell tower improvements to a selection of our Colorado cell towers — but which? One approach would be to understand which cell towers serve the largest population. However, with over 120,000 cell towers in Colorado, calculating the local population would be slow.. right? Not with H3!

What typically makes this approach slow is that population data is conventionally derived by intersecting buffers with administrative geographies. These zones — such as census blocks and tracts — are typically made up of complex shapes, which make them heavy to store and slow to process.

However, by leveraging H3 we can replace geometries with far lighter string or integer IDs. For this, you can use the workflow below (and use datasets from the CARTO Academy Data).

You can recreate this workflow with the following steps:

Convert each cell tower to a H3 index: use the H3 from GeoPoint component (which calls the H3_POINT_TO_CELL function in Snowflake) to do this, and ensure you set a resolution of 8 (the same as the Spatial Features table).
Calculate K-rings: create a K-Ring of 1 around each cell tower’s H3 using H3 KRing (which calls the H3_GRID_DISTANCE under the hood) — this will generate a new field called KRing_Index. This is a bit like calculating a spatial buffer — but far quicker!
Join LTE coverage and Spatial Features using H3: set the join type to left, with the main table’s join (H3 KRing) column as KRing_Index and the secondary table (Spatial Features) as H3.
Calculate the total population per tower: use the Group by component to calculate the total population within a K-Ring of 1 for each cell tower. Set the grouping column to H3 and the aggregation column to POPULATION_JOINED, with the aggregation type as SUM.
Join results to origin table: all that’s left to do is run another Join (left) between H3 from GeoPoint (join column: H3) and Group by (join column: H3) so you have all of the key variables for each cell tower in one table.

You can explore the results of this analysis in the map below, made in CARTO Builder!

To recreate this workflow, use this template and drop it directly into your CARTO Workspace.

Use Case #3 — Understanding accident hotspots

In this final use case, we’ll explore which parts of Paris’ cycle network could most benefit from improved safety measures.

We’ll use two tables from the CARTO Academy Data: PARIS_BIKE_ACCIDENTS (to understand the accident hotspots) and PARIS_CYCLING_NETWORK (to link these to physical cycle infrastructure).

The process consists of four key steps:

Convert accidents to a H3 grid & filter to the study area. You can use the Draw Custom Features component to create a custom study area covering the city of Paris, and then H3 Polyfill to convert this to an array of H3 cell IDs. As we’re interested in street-level insights, we’ve used the resolution of 10. Next, we convert the locations of bike accidents to H3 cell IDs using H3 from GeoPoint, and Join the two tables together. Using an inner join type effectively filters the accidents grid to our study area.
Aggregate & calculate hotspots. Use the GROUP BY component with the column H3 to get accident frequency per hexagon. We then use a Getis Ord* component to calculate hotspots (statistically significant clusters of high data values) from this grid. Finally, a Simple Filter is used to only retain statistically significant H3 cells with a P value of less than 0.1 — meaning we can be 90% confident that the outputs are spatial hotspots.
Convert cycle network to H3. To transform these hotspots into actionable insights, we’ll now determine which parts of the cycle network infrastructure fall within accident hotspots and could benefit from some targeted improvements. But instead of using a spatial join to do this, we’ll again leverage H3. First, we run a 25-meter ST Buffer on the cycle network and employ H3 Polyfill to convert buffer areas to arrays of H3 cells — at this stage, we’ll make sure to enable “Keep table input columns.”
Filter cycle links to those in accident hotspots. Now we can use another inner INNER JOIN to join our cycle network H3 polyfilled-grid to the results of our hotspot. We’re now left with only H3 cells in an accident hotspot AND which cover the cycling network. We’ll now use one final GROUP BY with the following parameters:

Group by column: CARTODB_ID
Aggregation: GI (AVG), HIGHWAY (ANY) & GEOM_JOINED (ANY). You can also use ANY aggregation to retain any contextual information from the cycle links, such as highway name.

Now, we have a table consisting of cycle links that can be found in an accident hotspot, as well as their respective average GI* score, which indicates the strength of the hotspot. You can explore the results of this in the map below — we’ve also included the original accident locations and H3 hotspots.

Through exploring the map we can see that many of the hotspots of cycle accidents occur in the center of Paris. In particular, the residential street of Rue Mahler in the Marais has the highest average GI* score of any part of the cycle network. With 21 accidents in the H3 cell covering the Rue Mahler & Rue de Rivoli junction (higher than in any H3 cell in the city), this would appear to be an ideal candidate for some safety improvements.

Follow the full tutorial for this use case here.

That concludes our whistle-stop tour of three H3 use cases! If you’d like to put any of this into action, remember you can open a 14-day free trial of CARTO from Snowflake Partner Connect, learn more about H3, and kick-start your transformative spatial analytics journey. CARTO also allows Snowflake users to enroll in their H3 Acceleration program — find out more here!

Low-code spatial analysis based on H3 with CARTO and Snowflake

Use Case #1 — Understanding the spatial distribution of taxi trips in NYC

Use Case #2 — Using K-Rings to calculate areas of influence

Use Case #3 — Understanding accident hotspots

Written by Oleksii Bielov