Clustering Internal Search Using Elixir Livebook With H3
In the world of data exploration and analysis, the ability to seamlessly integrate different tools and technologies is invaluable. Elixir Livebook, with its interactive and collaborative notebook interface, has emerged as a powerful tool for such endeavors. Pairing it with Uber’s H3, a hierarchical hexagonal geospatial indexing system, opens up fascinating possibilities. In this blog post, we’ll go over how we at Bounce leveraged Livebook to run experiments on internal search data, clustering it using H3.
What is Elixir Livebook?
Elixir Livebook is an interactive and collaborative notebook interface that allows users to write and run code, visualize data, and document their findings in real time. It’s built on top of the Elixir programming language and provides a seamless environment for exploratory data analysis, machine learning prototyping, and more.
Introducing H3
H3 is a powerful geospatial indexing system developed by Uber. It organizes the world into a hexagonal grid, allowing for efficient indexing and analysis of geospatial data at multiple resolutions. H3 is particularly well-suited for tasks like spatial aggregation, clustering, and geospatial visualization.
Combining Livebook and H3 for our experiment
Now, let’s dive into how we used Livebook and H3 to analyze and cluster internal search data. The first step is to install the required dependencies in Livebook.
Mix.install([
{:kino_db, "~> 0.2.6"},
{:req_bigquery, "~> 0.1"},
{:kino_maplibre, "~> 0.1.10"},
{:geo, "~> 3.6"}
])
Here at Bounce, we are using BigQuery to store our analytics data which is why the Req plugin is required. After installing the dependencies, we must get a GoTh (Google + Auth) connection.
credentials = %{
"private_key" => System.fetch_env!("LB_PRIVATE_KEY") |> String.replace("\\n", "\n"),
"client_email" => System.fetch_env!("LB_CLIENT_EMAIL")
}
opts = [
name: ReqBigQuery.Goth,
http_client: &Req.request/1,
source: {:service_account, credentials}
]
{:ok, _pid} = Kino.start_child({Goth, opts})
conn =
Req.new(http_errors: :raise)
|> ReqBigQuery.attach(
goth: ReqBigQuery.Goth,
project_id: "name_of_our_project",
default_dataset_id: ""
)
At this point, we have installed and obtained all of the resources required to start the experiment. So, here’s the query we used to cluster internal searches using H3.
query = """
with source_fact_internal_searches as (
select
internal_searches.event_id,
internal_searches.search_geom
from
internal_searches
left join
city_boundaries
on
ST_WITHIN(
internal_searches.search_geom,
city_boundaries.city_boundary
)
where
DATE_TRUNC(PARSE_DATE('%Y%m%d', internal_searches.search_date), month) >= DATE_ADD(CURRENT_DATE(), interval -12 month) AND
city_boundaries.city_nk IS NULL
),
h3_data as (select
`carto-os`.carto.H3_FROMGEOGPOINT(search_geom, 9) as cluster_id,
COUNT(event_id) as internal_searches_l12m_qty
from
source_fact_internal_searches
group by cluster_id
having cluster_id is not null
)
select
cluster_id,
ANY_VALUE(internal_searches_l12m_qty) as total_internal_searches,
ST_ASGEOJSON(`carto-os`.carto.H3_BOUNDARY(cluster_id)) AS geom
from
h3_data
group by cluster_id
"""
results = Req.post!(conn, bigquery: query).body
Let’s break down this query and explain what we are doing. First, we get all internal searches from the last twelve months (L12M), and using the method ST_WITHIN
we check if it is within our city boundaries and filter only the ones outside the city’s boundaries
select
internal_searches.event_id,
internal_searches.search_geom
from
internal_searches
left join
city_boundaries
on
ST_WITHIN(
internal_searches.search_geom,
city_boundaries.city_boundary
)
where
DATE_TRUNC(PARSE_DATE('%Y%m%d', internal_searches.search_date), month)
>= DATE_ADD(CURRENT_DATE(), interval -12 month) AND
city_boundaries.city_nk IS NULL
Then in the subsequent query, using a Carto extension, we get the H3 clustered for all internal search coordinates.
select
`carto-os`.carto.H3_FROMGEOGPOINT(search_geom, 9) as cluster_id,
COUNT(event_id) as internal_searches_l12m_qty
from
source_fact_internal_searches
group by cluster_id
having cluster_id is not null
Last but not least, using the cluster_id
we get the boundaries and transform it into a GeoJSON.
select
cluster_id,
ANY_VALUE(internal_searches_l12m_qty) as total_internal_searches,
ST_ASGEOJSON(`carto-os`.carto.H3_BOUNDARY(cluster_id)) AS geom
from
h3_data
group by cluster_id
Then we ended up with something similar to this table (Fig. 2).
Cool, we now have all the information needed but we want to visualize it. Kino is a very handy library to render rich and interactive outputs directly from your Elixir code. As you might noticed in the dependencies we installed kino_maplibre.
To show the clusters on the map we needed to build the data input FeatureCollection
.
columns = results.columns
to_map_struct = fn entry ->
Enum.zip(columns, entry)
|> Map.new()
end
cluster_parsed = Enum.map(results.rows, to_map_struct)
list =
cluster_parsed
|> Enum.map(fn e ->
%{
"type" => "Feature",
"geometry" => Jason.decode!(Map.get(e, "geom")),
"properties" => %{
"total" => Map.get(e, "total_internal_searches"),
"cluster_id" => Map.get(e, "cluster_id")
}
}
end)
geojson = %{
"type" => "FeatureCollection",
"features" => list
}
After we have the GeoJSON, we can use the MapLibre modules to build the map and its layers.
# Mix.install([
# {:kino_maplibre, "~> 0.1.10"},
# ])
MapLibre.new(style: :street)
|> MapLibre.add_source("main", type: :geojson, data: geojson)
|> MapLibre.add_layer(
id: "clusters",
type: :fill,
source: "main",
paint: [
fill_color: "#00000055"
]
)
|> MapLibre.add_layer(
id: "cluster_size",
source: "main",
type: :symbol,
layout: [text_field: ["get", "total"], text_size: 10],
paint: [text_color: "red"]
)
|> Kino.MapLibre.info_on_click("clusters", "cluster_id")
This will give us the desired result: a map with all internal searches clustered using H3 that are outside our city’s boundaries (Fig. 3).
Conclusion
At Bounce, we’re always experimenting and evolving our product based on data. By leveraging Elixir Livebook and H3, you can quickly build a visualization tool that aggregates geolocation data and makes it accessible to all stakeholders, empowering them with better and more readable data.
We are hiring! 💙 Check out our careers page