Clustering Internal Search Using Elixir Livebook With H3

Bruno Bacarini
Bounce Engineering
Published in
4 min readMay 21, 2024

In the world of data exploration and analysis, the ability to seamlessly integrate different tools and technologies is invaluable. Elixir Livebook, with its interactive and collaborative notebook interface, has emerged as a powerful tool for such endeavors. Pairing it with Uber’s H3, a hierarchical hexagonal geospatial indexing system, opens up fascinating possibilities. In this blog post, we’ll go over how we at Bounce leveraged Livebook to run experiments on internal search data, clustering it using H3.

Fig. 1 — Elixir Livebook and Uber H3

What is Elixir Livebook?

Elixir Livebook is an interactive and collaborative notebook interface that allows users to write and run code, visualize data, and document their findings in real time. It’s built on top of the Elixir programming language and provides a seamless environment for exploratory data analysis, machine learning prototyping, and more.

Introducing H3

H3 is a powerful geospatial indexing system developed by Uber. It organizes the world into a hexagonal grid, allowing for efficient indexing and analysis of geospatial data at multiple resolutions. H3 is particularly well-suited for tasks like spatial aggregation, clustering, and geospatial visualization.

Combining Livebook and H3 for our experiment

Now, let’s dive into how we used Livebook and H3 to analyze and cluster internal search data. The first step is to install the required dependencies in Livebook.

Mix.install([
{:kino_db, "~> 0.2.6"},
{:req_bigquery, "~> 0.1"},
{:kino_maplibre, "~> 0.1.10"},
{:geo, "~> 3.6"}
])

Here at Bounce, we are using BigQuery to store our analytics data which is why the Req plugin is required. After installing the dependencies, we must get a GoTh (Google + Auth) connection.

credentials = %{
"private_key" => System.fetch_env!("LB_PRIVATE_KEY") |> String.replace("\\n", "\n"),
"client_email" => System.fetch_env!("LB_CLIENT_EMAIL")
}

opts = [
name: ReqBigQuery.Goth,
http_client: &Req.request/1,
source: {:service_account, credentials}
]

{:ok, _pid} = Kino.start_child({Goth, opts})

conn =
Req.new(http_errors: :raise)
|> ReqBigQuery.attach(
goth: ReqBigQuery.Goth,
project_id: "name_of_our_project",
default_dataset_id: ""
)

At this point, we have installed and obtained all of the resources required to start the experiment. So, here’s the query we used to cluster internal searches using H3.

query = """ 
with source_fact_internal_searches as (
select
internal_searches.event_id,
internal_searches.search_geom
from
internal_searches
left join
city_boundaries
on
ST_WITHIN(
internal_searches.search_geom,
city_boundaries.city_boundary
)
where
DATE_TRUNC(PARSE_DATE('%Y%m%d', internal_searches.search_date), month) >= DATE_ADD(CURRENT_DATE(), interval -12 month) AND
city_boundaries.city_nk IS NULL
),
h3_data as (select
`carto-os`.carto.H3_FROMGEOGPOINT(search_geom, 9) as cluster_id,
COUNT(event_id) as internal_searches_l12m_qty
from
source_fact_internal_searches
group by cluster_id
having cluster_id is not null
)
select
cluster_id,
ANY_VALUE(internal_searches_l12m_qty) as total_internal_searches,
ST_ASGEOJSON(`carto-os`.carto.H3_BOUNDARY(cluster_id)) AS geom
from
h3_data
group by cluster_id
"""

results = Req.post!(conn, bigquery: query).body

Let’s break down this query and explain what we are doing. First, we get all internal searches from the last twelve months (L12M), and using the method ST_WITHIN we check if it is within our city boundaries and filter only the ones outside the city’s boundaries

select
internal_searches.event_id,
internal_searches.search_geom
from
internal_searches
left join
city_boundaries
on
ST_WITHIN(
internal_searches.search_geom,
city_boundaries.city_boundary
)
where
DATE_TRUNC(PARSE_DATE('%Y%m%d', internal_searches.search_date), month)
>= DATE_ADD(CURRENT_DATE(), interval -12 month) AND
city_boundaries.city_nk IS NULL

Then in the subsequent query, using a Carto extension, we get the H3 clustered for all internal search coordinates.

select
`carto-os`.carto.H3_FROMGEOGPOINT(search_geom, 9) as cluster_id,
COUNT(event_id) as internal_searches_l12m_qty
from
source_fact_internal_searches
group by cluster_id
having cluster_id is not null

Last but not least, using the cluster_id we get the boundaries and transform it into a GeoJSON.

select
cluster_id,
ANY_VALUE(internal_searches_l12m_qty) as total_internal_searches,
ST_ASGEOJSON(`carto-os`.carto.H3_BOUNDARY(cluster_id)) AS geom
from
h3_data
group by cluster_id

Then we ended up with something similar to this table (Fig. 2).

Fig. 2— Result of the query to cluster internal searches

Cool, we now have all the information needed but we want to visualize it. Kino is a very handy library to render rich and interactive outputs directly from your Elixir code. As you might noticed in the dependencies we installed kino_maplibre.

To show the clusters on the map we needed to build the data input FeatureCollection.

columns = results.columns

to_map_struct = fn entry ->
Enum.zip(columns, entry)
|> Map.new()
end

cluster_parsed = Enum.map(results.rows, to_map_struct)

list =
cluster_parsed
|> Enum.map(fn e ->
%{
"type" => "Feature",
"geometry" => Jason.decode!(Map.get(e, "geom")),
"properties" => %{
"total" => Map.get(e, "total_internal_searches"),
"cluster_id" => Map.get(e, "cluster_id")
}
}
end)

geojson = %{
"type" => "FeatureCollection",
"features" => list
}

After we have the GeoJSON, we can use the MapLibre modules to build the map and its layers.

# Mix.install([
# {:kino_maplibre, "~> 0.1.10"},
# ])

MapLibre.new(style: :street)
|> MapLibre.add_source("main", type: :geojson, data: geojson)
|> MapLibre.add_layer(
id: "clusters",
type: :fill,
source: "main",
paint: [
fill_color: "#00000055"
]
)
|> MapLibre.add_layer(
id: "cluster_size",
source: "main",
type: :symbol,
layout: [text_field: ["get", "total"], text_size: 10],
paint: [text_color: "red"]
)
|> Kino.MapLibre.info_on_click("clusters", "cluster_id")

This will give us the desired result: a map with all internal searches clustered using H3 that are outside our city’s boundaries (Fig. 3).

Fig. 3— Clustered internal searches rendered using Uber’s H3

Conclusion

At Bounce, we’re always experimenting and evolving our product based on data. By leveraging Elixir Livebook and H3, you can quickly build a visualization tool that aggregates geolocation data and makes it accessible to all stakeholders, empowering them with better and more readable data.

We are hiring! 💙 Check out our careers page

--

--