How to Filter Data by Map Shape

Learn how to filter locations by map shape using PostgreSQL and Elasticsearch.

Rafal Lisowski
EL Passion Blog
5 min readNov 14, 2017

--

Why Filtering by Map Shape is So Powerful

Imagine you are looking for a flat to rent. You’d like to live 5 mins from your favorite running route, 10 mins from your workplace, 15 mins from the best farmers’ market in the town (or 1 min from your mother).

If you go to any estate website, the most precise way to filter the flats is by clicking on these 3 checkboxes that stand for the districts you are only partially interested in. With the commute time, tube zones and the next street that is already falling into another community, you’ll have to deal yourself.

And what if you can filter the results by administrative areas, tube zones, postal districts or travel time to anywhere… Sounds better, right?

That’s what filtering by map shape is for.

How to Start

In this article, I am using Ruby in all the examples for the sake of simplicity. Before we go further, please check out GeoJSON and WKT (2 min reading) if you’re not familiar with these data formats. We will use them later on.

Option 1: PostgreSQL + Postgis

Requirements

First you need to install PostgreSQL with PostGIS on your machine. To play with our data we will use RGeo which is geospatial library for Ruby.

App

Generate brand new Rails app with a PostgreSQL as a database and add the following dependencies:

Switch database adapter to postgis:

Run it! Now when you visit http://localhost:3000, you should see a nice Rails welcome page.

Prepare the Data to Be Filtered

There is a bunch of websites where you can download public domain map shapes like https://data.london.gov.uk/ or http://geocommons.com/ I’m going to use postal districts in London.

In my case .shp file has the wrong format. I used ogr2ogr tool which is a part of GDAL package and converted the file according to the WGS standard. It’s quite common to convert between different standards, so you may find it useful in the future.

Now, that everything is in place, let’s create a database migration that will allow us store shapes in PostgreSQL:

The model is dead simple:

Import the Shapefiles

All files (.shp, .shx and .dbf) have to be in the same directory, even though the only path we provide in our importer is .shp.

Locations are the last data that we are missing. Let’s do it quickly.

To make it easier to use we can implement db:seed task:

Finally the filtering:

You can also provide the map shape as a part of the query, so it does not have to be stored in PostgreSQL.

Lets check the speed with benchmark-ips:

As you can see, filtering locations by map shape with PostgreSQL + PostGIS works well, but is quite slow. Only ~4 iterations per second.

Option 2: Elasticsearch

Let’s use the same app, but this time we’ll store the data in Elasticsearch.

If you’ve never used it, please read about Elasticsearch’s core concepts. Index, type and document are essential for us right now.

In a nutshell, the index is something that keeps the data like a database, the type is like a table so you can have many types inside one index. The document is an instance of the data like the record in a relational database. It’s an oversimplification, but you can use it to map a mental model.

Let’s use the official RubyGemto connect to Elasticsearch.

I assume Elasticserch service is installed and running on the default port.

Import the shapes into Elasticsearch.

The vital part of the importer is index mapping. We set up Elasticsearch to handle geographic data with geo_shape datatype. Take a closer look at the precision parameter. Picking a correct value is crucial here. Small enough to fit your needs but big enough to make Elasticsearch reasonably fast. The lower the value, the slower Elasticsearch indexing and filtering is.

Now, the importer for locations:

Like you can see it’s very similar to the map shapes importer. We just use a different input structure of geo_shape, this time it’s Point.

You have two options to filter out locations. Like in PostgreSQL you can send map shape inside the query:

Or you can use an already indexed map shape:

In both cases geo_shape must be indexed by Elasticsearch. If the shape is a part of the query then Elasticsearch is doing indexing at query time. It means that you get the result with a delay. I strongly advise doing pre-indexing everywhere you can.

Let’s check the efficiency:

As you can see, quering locations by indexed map shape in Elasticsearch is much faster than any other approach.

Pitfalls

PostgreSQL is much slower but to be honest I’ve never used it in a production environment with PostGIS and have never dug deeper into PostGIS performance tuning. You can add spatial indices and try to improve the performance.

In Elasticsearch you have to watch out on the CPU usage. Indexing geo shapes consumes a lot of CPU. If you execute a lot of queries with geo shape provided as a part of the search query you may kill your cluster. The same can happen if you store too many geo shapes in one index. In my case, it was 100% CPU usage even if node did not get any queries. The solution was to distribute shapes across many indices so that we have better load balancing not only using shards but also indices.

Case study

In the app that we developed for Rightmove, we filter properties by administrative areas, tube zones or postal areas providing more relevant content. We also utilise Travel Time Platform API to search using commute destination and transport mode. Thanks to that the user can find a house or a flat in the desired travel time from any place (like workplace, airport or family home).

Tap the 👏 button if you found this article useful!

About the Author
Rafał is Software Engineer at EL Passion. You can find him on GitHub or LikedIn.

Find EL Passion on Facebook, Twitter and Instagram.

--

--

Rafal Lisowski
EL Passion Blog

A multi-skilled IT specialist and Ruby/JS developer with a good all-around expertise.