Geolocation search optimization

Astandri Koesriputranto
Bukalapak Data
Published in
8 min readJun 8, 2022

Introduction to geohash and map visualization with a geohash python library and folium, and how we in Bukalapak utilize geohash to find search points to optimize geolocation search.

Photo by delfi de la Rua on Unsplash

Search Optimization with Geohash

TL;DR

In this article, I want to share an interesting approach to solving an optimization problem that we had in Bukalapak. The approach itself is quite simple, but powerful enough, and can be used in many other potential use cases.

Problem:

  • We have a search API that can search any point of interest from 1 specific coordinate.
  • It has a cost for every API call (quite decent).
  • There is a limit to what the search API can do, it can only yield 60 results in 1 call.
  • Multiple calls in the same coordinates will yield the same result.
  • If, for example, I want to search “Warung Kopi” in Kota Bogor, we cannot just use a single coordinate (e.g center of Kota Bogor).
  • If I search in multiple randomized coordinates, there is a potentially redundant search area and also the result, and the cost will be inefficient as well.

Solution:

  • We can spread our search to specific areas across Kota Bogor.
  • One way to do it is by using Geohash mapping.
  • By using Geohash, we can reduce the redundancy of the search area and also the result.
  • Can cover more specific areas, means a higher chance to capture all possible “Warung Kopi” in Kota Bogor.
  • The final output will be some search points that we can use to search “Warung Kopi” in Kota Bogor.

I will try to explain this approach from the beginning by introducing geohash, geohash python library, how we can visualize the data, and slowly we will get into the solution part of the problem.

Note: The visualization part itself is not a mandatory component of the solution, but it will be nice to have if we want to present our solution, to our stakeholders for example.

Enjoy the ride!

What is Geohash

Based on Wikipedia, geohash definition is:

“Geohash is a public domain geocode system invented in 2008 by Gustavo Niemeyer[1] and (similar work in 1966) G.M. Morton,[2] which encodes a geographic location into a short string of letters and digits.”

With geohash, we split the earth into boxes with different dimensions. Check out this figure:

image from https://www.movable-type.co.uk/scripts/geohash.html

Now the question is, what it can be used for? Many… Especially any kind of features or analysis that is related to spatial data. For example:

  • Urban analytics
  • Neighborhood mapping and density analysis
  • Area plotting when you don’t have district mapping (polygons)

There are different precision in geohash mapping from 1–12.

  1. ≤ 5,000km × 5,000km
  2. ≤ 1,250km × 625km
  3. ≤ 156km × 156km
  4. ≤ 39.1km × 19.5km
  5. ≤ 4.89km × 4.89km
  6. ≤ 1.22km × 0.61km
  7. ≤ 153m × 153m
  8. ≤ 38.2m × 19.1m
  9. ≤ 4.77m × 4.77m
  10. ≤ 1.19m × 0.596m
  11. ≤ 149mm × 149mm
  12. ≤ 37.2mm × 18.6mm

Let’s see what these geohashes look like in Indonesia. We can check it out using this website: https://www.movable-type.co.uk/scripts/geohash.html

Indonesia geohash for precision level 1 (image by Author)

Looks like most area in Indonesia is mapped to q character for precision at 1. Let’s see Jakarta Geohash.

Jakarta Geohash at precision level 4 (image by Author)

In precision level at 4, looks like most areas of Jakarta will be mapped to qqgv, qqgu, and qquh.

Some spatial terminologies

Before we proceed further, I would like to introduce some terminologies that will be used along the way.

Latitude & Longitude

Latitude and Longitude are the units that represent the coordinates in the geographic coordinate system. (https://www.latlong.net/)

Polygon

A set of coordinates that are closed and form a shape. Usually used to show the border of an area/district/city.

Greater Jakarta Area polygon from Google Maps (image by Author)

Multipolygon

A set of polygons. Usually used to group some polygons of an area/district/city borders but they are in a separated closed shape (like separated by some distance, island, etc).

Kab. Jepara Multipolygon from Google Maps (image by Author)

Kab. Jepara, a sample of a city with a separated island, therefore will need a multi polygon to plot the borders.

GeoJson

GeoJSON is a format for encoding a variety of geographic data structures. GeoJSON supports the following geometry types: Point, LineString, Polygon, MultiPoint, MultiLineString, and MultiPolygon. (https://geojson.org/)

Shapefile

The shapefile format is a geospatial vector data format for geographic information system (GIS) software. It is developed and regulated by Esri as a mostly open specification for data interoperability among Esri and other GIS software products. (https://en.wikipedia.org/wiki/Shapefile)

Geohash in python

Python-Geohash

There are several geohash python libraries, but in this case, we are using python-geohash==0.8.5. The library is quite old already, but it is still sufficient for our needs. You can install the library using pip as usual pip install python-geohash==0.8.5.

Let’s try to use the library. To encode the latitude and longitude data as a geohash, you can use encode function.

import geohashgeohash.encode(
latitude=-6.24,
longitude=106.7,
precision=4
)
Encode function (image by Author)

We can also decode the geohash which will return to the center point of the geohash.

geohash.decode('qqu57zr2v')
Decode function (image by Author)

To get the bounding box of the geohash, we can use bbox function.

geohash.bbox('qqu57zr2v')
Bounding Box or bbox function (image by Author)

There will be 4 points, S, W, N, and E. The combination pairs will form a rectangle that represents the geohash.

  • Upper_left_point = (N, W)
  • Upper_right_point = (N, E)
  • Lower_right_point = (S, E)
  • Lower_left_point = (S, W)

Visualization with Folium

So far we can get the latitude and longitude from the geohash and vice versa. Now let’s try to visualize the geohash in the map canvas. We will use the folium library to do the visualization.

pip install folium==0.12.0

First, we need to draw the map.

import foliumlat, long = geohash.decode(‘qqu57’) # get the center of the geohash# create a map canvas
m = folium.Map(
location=[lat,long], # set the center location of the map
zoom_start=9.5,
tiles=”CartoDB Positron”
)
m # show the map
Empty map canvas (image by Author)

In folium, after we have our canvas, we can add items on top of it. It can be markers, circles, rectangles, and many more with the add_to method. Let’s try to add our geohash box on top of it.

decoded = geohash.bbox(‘qqu57’) # decode the geohashW = decoded[“w”]
E = decoded[“e”]
N = decoded[“n”]
S = decoded[“s”]
# create each point of the rectangle
upper_left = (N, W)
upper_right = (N, E)
lower_right = (S, E)
lower_left = (S, W)
edges = [upper_left, upper_right, lower_right, lower_left]# create rectangle object and add it to the map canvas
folium.Rectangle(
bounds=edges,
color=”blue”,
fill_color=”green”,
weight=1,
popup=’qqu57',
).add_to(m)
m # show the map
Geohash visualization in map canvas (image by Author)

Solution Part: Search optimization with geohash

So far we have learned about the basics of geohash and the visualization using folium. In Bukalapak, we use those features to help us optimize our geolocation search project.

Get polygon data

Let’s begin by collecting the polygon of Kota Bogor. We can download the data from https://data.humdata.org/dataset/cod-ab-idn or any other source. The data that we need will be this one:

After you get the data, extract the zip file and load the administrative level 2 shape file. To load the data, you will need to install geopandas first. The recommended method is using conda, if you’re not using conda then you can still use pip but will have to make sure other dependencies are installed correctly.

Source: https://geopandas.org/en/stable/getting_started/install.html

conda install geopandas # run in terminal, if not installed yet

Then you can load the shapefile

import geopandas as gpdshapefile = gpd.read_file(“~/Downloads/idn_adm_bps_20200401_shp/idn_admbnda_adm2_bps_20200401.shp”)
Details of Indonesia administrative level 2 data (image by Author)

Then we can locate Kota Bogor to get the polygon.

shapefile.loc[shapefile[‘ADM2_EN’]==’Kota Bogor’]
Kota Bogor polygon data (image by Author)

Now let’s visualize the polygon on the map using folium.

Plotting Kota Bogor polygon with folium
Kota Bogor polygon visualized using folium (image by Author)

Processing the Geohash

After we have our Kota Bogor Polygon, we can then select which geohash is included in the Polygon. To be able to do that, we need to create a helper function called geohash_inside_polygon .

geohash_inside_polygon function

Then we run the function by supplying the Kota Bogor polygon.

geohash_inside_polygon(
[[[[point[1], point[0]] for point in polygon_kota_bogor.exterior.coords]]],
precision=5
)
Geohash inside Kota Bogor (image by Author)

Then we can visualize the geohash in folium like this:

Visualize a map of Kota Bogor along with the geohash
Kota Bogor polygon and geohash precision level 5 visualized (image by Author)

For precision level at 6, it will look like this:

Kota Bogor polygon and geohash precision level 6visualized (image by Author)

Select the search points

Let’s say we want to search quite deep in each of the Kota Bogor areas, so we select the precision level at 6. The final steps are to decode the geohashes inside the Polygon and then determine the search radius.

To decode, we can simply use:

search_points = [geohash.decode(gh) for gh in geohash_bogor]
Generated search points for Kota Bogor (image by Author)

Then, because we are using a precision level of 6 (1.22km × 0.61km), a radius of 0.6km — 1.2km should be fine. Let’s try a radius of 0.6km and visualize it.

You can do it easily by changing the last part from geohash_3.py gist example to:

folium.Circle(
location=point,
radius=600, # in meters
color=”green”,
fill_color=”yellow”,
weight=0.5,
).add_to(map_kota_bogor)

Finally, by using this approach, we will have 2 options as our solution:

  1. Using precision at 6, we will have ~150 points across Kota Bogor to do our search. So the estimated cost will be ~150 * X. This option is great if we are confident that in each Geohash precision 6 radius there will be many results that we can get.
  2. Another option is to use Geohash precision 5, we will have around 3 points in Kota Bogor. The estimated cost will be 3 * X. Contrary to the previous option, in this case, we don’t think there will be not many results, therefore we only search some bigger areas only.

Closing

Congratulations upon reaching this point!

I hope you learn a lot from this specific approach and also give you inspiration on how you can tackle similar problems for your projects.

You’ve learned about the Geohash, how you can use it in python, and also visualize it using folium, which is heavily related to spatial analysis.

Now it’s your turn to play around with it on your specific use case.

As always, happy learning!! 🚀🚀🚀

--

--