Improving geo features to handle millions of daily requests

Geo features are commonly used. At a certain scale, performance is highly impacted. In this article, I will share a very interesting problem that we solved

ilyess zenasni
HungerStation
7 min readJan 16, 2023

--

Context

Delivery Area Polygon

Delivery areas are represented by polygons. Polygons are represented by a 3 dimensional array:

  • Deeper level is a float array of 2 values, representing coordinates (latitude and longitude).
  • Above level is an array of coordinates, representing a shape.
  • Higher level is an array of shapes, representing polygons with holes.
Use case of Polygon with holes

There are many reasons why we use polygons instead of radius distance:

  • Some delivery areas should not collide with others and define a precise zone.
  • Sometimes a Geo Point at 1km takes much more time to reach than a Geo Point at 2km, because of the roads.
  • Delivery areas have holes, for example we want to exclude a specific residence.

Delivery area: Sub Polygons

Delivery areas are composed by sub polygons. Those sub polygons represent multiple delivery areas depending on external factors such as the weather.

Use case of a delivery area with sub polygons

If a weather event hit a specific area, we want to shrink the delivery area based on the intensity of the weather event. The system should be flexible enough to easily shrink those delivery areas.

Expectations

We must provide a system that:

  • Provides best user experience with high performance (<100ms for P99).
  • Provides accurate data to the user.
  • Provides tools and be reactive on weather events.
  • Covers million requests per day.
  • and even more challenges…

As you can see, delivery area is a much bigger and interesting scope than it initially seems.

Tech

PostgreSQL ST_Intersects

Let me go deeper into the technical problem we were trying to solve. The main query is fetching all delivery areas by the given user location. We were sometimes facing big latency spikes on it.

After investigation we noticed that the Database query, using ST_Intersects with PostgreSQL, was the root cause. At some scale, the performance of ST_Intersects is taking a lot of time.

The performance is impacted by:

  • The number of polygons we have in the database.
  • The size of them, which means the number of vertices.
  • The complexity of them, which means the difficulty to link all vertices.

The goal was to considerably improve the performance of this query.

ST_Simplify

First we targeted quick wins. By reading in details PostgreSQL documentation we found some tricks to improve performance. We can reduce the size of polygons by simplifying them, using ST_Simplify.

Example of Simplified polygon

ST_Simplify is using Ramer–Douglas–Peucker algorithm to convert complex polygons to simpler one, with less vertices. Ramer–Douglas–Peucker algorithm

Visualization of the Ramer–Douglas–Peucker algorithm

Of course by doing so, we are also reducing the precision of those areas. We defined a tolerance level with the business and ended up with a precision of 100m.

Some polygons were reduced in size 10 times that significantly improved performance.

ST_Subdivide

Second, we can subdivide polygons using ST_Subdivide, which will reduce its complexity. The idea is to subdivide a big polygon into smaller and simpler versions. PostgreSQL is able to perform much faster this way.

Visualization of subdivided polygon

Doing so will have zero impact on the precision. However, the database entries will considerably increase since we will have multiple sub polygons.

We multiplied by 5 the number of entries in the databases, all of those in a new table. And we got a huge performance improvement!

Spatial engine

Above quick wins gave us time to implement much more complex solution; we created a spatial engine. The idea is to perform this query directly with the instance’s CPU and memory, avoiding database calls.

First, we developed an algorithm that quickly tells us if a point is inside a polygon. There are multiple algorithms to tells us if a point is inside or outside a polygon. The most famous, and one of the most performant is ray-casting algorithm also known as even-odd rule algorithm. Point in polygon

Visualization of odd even algorithm

The idea is to cast a ray in any direction starting from the point, and count the number of intersections. If the count is odd, we are outside the polygon. If the count is even we are inside the polygon.

V1 Spatial engine, Point in Polygon

We implemented that algorithm and did some performance testing. We’ve tested 50,000 polygons of 100 vertices each, the algorithm was able to answer with 100% accuracy within 800ms. That’s on a small machine 2.6Ghz processor. Results were very promising but could be improved even further.

After a long investigation, we noticed that one line of code was taking a very long time to be executed:

nbShapes := len(polygon.Coords())

len is a golang native function, which is very simple and should be very fast. polygon.Coords is simply returning the coordinates of the polygon. Here is the definition of that function: // Coords returns all of g’s coordinates. geom.Polygon library is an optimized, well designed and developed library. What could be the issue?

Well, by going deeper into the code, understanding it, we found out that the function is aiming for what it is saying, and returns all g’s coordinates, but it doesn’t say that it does not have that data organized in memory. In fact, the polygon is stored in memory in a big one dimensional array, representing all coordinates of all shapes, without separating them. The number of shapes, the size of each shape is stored in a different attribute.

So every time we are asking to get Coords, we are building a new 3 dimensional object. Which means 3 loops, n3 complexity, with tons of calls, system function that allocate memory. As you know, any system function is much more complex and time consuming than it looks.

go-geom Coords() detail code

We updated the entire algorithm to work in one dimensional array, using the low level iterative pointer algorithm.

V2 Spatial engine, Point in Polygon

Pay attention to pointInSinglePolygon function prototype move from:

func pointInSinglePolygon(poly []geom.Coord, point *geom.Point) bool

to:

func pointInSinglePolygon(poly []float64, start int, end int, point *geom.Point) bool

We also implemented a multiple goroutine calculation to calculate all data in parallel.

Goroutine calculation

We ended with the same conditions, 50,000 polygons of 100 vertices each. The algorithm was able to answer with 100% correctness within 4ms, which is 200 times faster than before! Which gives us an average of 80 nanoseconds per 100 vertices polygon!

Keeping data in memory

Once we had a very optimized algorithm, we had to use it properly on servers. At this point, the data is in the database and pods don’t have it in memory. So we had to load all polygons in memory, and make sure that the data is up to date.

In fact, many events happening on the fly, like shrinking event that I talked about earlier. In order to have consistent data, we have to make sure that all pods do not miss events and are updated as soon as possible.

We created a queue that distributes all messages to all subscribers. When the database is updated, messages are produced. All pods consume those messages and update their memory accordingly.

When a pod is warming up, it has to load all polygons at once. We keep a timestamp when was the last update, and that pod can retrieve missed messages during the warm up.

Of course, we stress tested it and we put it behind feature flags, to have a better control and easily fallback to previous behaviour if anything goes wrong.

Release

The quick wins solutions drastically reduced the database load. We reduced P99 latency from average 400ms to 50ms, and P50 from 350ms to almost 1ms!

Latency dropped with new system

The spatial engine will give us even better numbers.

We are confident on the stability and can focus on next challenges.

Next challenges

Scaling up is important, we want to reduce the warming up time of pods, reducing memory and CPU usage. To do so, we are planning to do partitioning based on location, using GeoHash. That will also improve performances since we will reduce the number of polygons to check, within a cluster.

Visualization of Geo Hash algorithm

We will also optimize the algorithm by using R-TREE index algorithm, to store bounded polygons in memory. That allows us to super quickly fetch a filtered list of polygons.

Visualization of R-TREE algorithm

Then we apply the regular Ray-Casting algorithm to verify point in polygon, on very limited data.

Conclusion

As you can see, challenges are very interesting and complex. I highly recommend to always have a deep look at the code of libraries that you are using. The reality is sometimes far away from what we understand by reading surfacing information or documentation.

As long as we scale, we’ll face more challenges so we must always be proactive and be 3 steps ahead.

--

--