Efficient Technique To Fetch Data Based On Location

Sarvesh Nerurkar
7 min readMar 21, 2023

--

Have you ever wondered how companies like Swiggy, Zomato, Airbnb etc. show you nearby Restaurants/Hotels so quickly and correctly considering such a large base of users and millions of location data? If you want to know how it achieves such quick results internally? You have come to the right place 😃 .

Let me explain different approaches to solving this problem.

Just to explain the problem and solution in detail, let us consider an example of Zomato with some base values:-

  • 2 Million records of restaurants with Latitude/Longitude in the DB.
  • 1000 users/second queries for finding restaurants near them.

Well, we are building a system to support the above requirements. Let's explore solutions which would cater above requirements in an efficient way.

Solution 1 (Naive Approach):-

The most obvious way to solve this problem is to query on lat/long column by finding diagonal coordinates:-

Eg:- select * from restaurants  where lat>10.0091 and lat < 10.1110 and lon>81.9990 and lon< 82.0003;
  • It is very difficult to find restaurants in a radius rather than a square using a diagonal approach.
  • Also, calculating diagonal lat/long is an additional headache as the amount of change in degrees in terms of distance (KM) is different at the equator vs poles.
  • Even in terms of performance, too many less than and greater than queries won't work much efficiently with default b-tree indexes.

So, Solution 1 is not recommended in the production environment ❌.

Well don’t worry, we have got much better solutions, just keep reading 👨🏻‍💻.

Solution 2 (Using SQL Spatial Features):-

It's obvious that we will be storing restaurants lat/long along with other details of the restaurant in a table, but instead of saving lat/long in the float column, SQL DB provides support for spatial datatypes like geometry, point, geography.. etc which makes it easier to work on spatial data.

Let me explain how this works by taking PostgreSQL as an example. (Note:- Similar datatypes are provided by other SQLs also.)

Lat/Long data can be stored in a column of geometry data type. Geometry is the spatial datatype provided by Postgres. Our lat/long coordinates will be created as Point but stored in geometry column by casting it, as Point is a type of geometry. Postgres also support Point data type for the column, but keeping it as geometry makes it easier if you are using ORM frameworks in your application.

Let us create a simple table with records to demonstrate how this works.

create extension postgis;
create table restaurants (id int,name varchar(100), lat_long geometry);
insert into restaurants values(1,'Nisarg',ST_GeomFromText('Point(89.9999 10.0001)'));
select * from restaurants where ST_DWithin(lat_long,cast(ST_SetSRID(ST_MakePoint(89.9999,10.0001),4326) as geography),1000);
  • Here postgis extension is a spatial database extender for PostgreSQL which needs to be added to support spatial queries.
  • Table ‘restaurant’ is created with columns:- id, name and lat_long whose type is geometry.
  • Inserting data in restaurants with saving lat/long as point . Here we are casting point in the geometry column using the ‘ST_GeomFromText’ function.
  • When finding restaurants in a radius we are using the ‘ST_DWithin’ function which takes 3 parameters,1st is the column name ,2nd is the location as geography and 3rd is the radius in meters.
  • Note:- In the above query radius is 1000 meters.

Maintaining data in spatial data types makes querying for a radius easier as mentioned in the above query but will it solve the problem efficiently?

The answer is no!!! Just using spatial data type will not fetch radius queries efficiently, If you do explain analyse, you will see that it still uses traditional indexes and sequential scan to fetch data, which will be slower in the production environment.

Don’t worry we have got 🔥 Spatial Indexes 🔥 to the rescue!!

create index idx_restaurants_lat_long on restaurants using gist((lat_long::geography));

The spatial Index which I have used here is the gist.

Pro Tip:- While adding this index, need to cast lat_long as geography, since we are querying data using the ST_DWithin function which uses geography data type. If you add gist index without casting then it does not use spatial indexes for the ST_DWithin function(I have wasted a lot of time understanding this😅, hope this saves your time).

Before explaining how the spatial index works internally, let me show you the results of load testing using this solution:-

The above results mean that 99% of user requests were delivered within 1.2 seconds considering 1000 concurrent requests.

Well, the magic is performed using an R-Tree data structure.

The below diagram demonstrates how data is divided in R-Tree:-

  • R-Trees are made of a single root, internal and leaf nodes
  • The root has a pointer to the largest region in the special domain
  • The parent nodes will hold child nodes where child nodes completely overlap the region of the parent nodes
  • Leaf nodes hold data about MBR to the current object
  • MBR-Minimum bounding region is the minimum boundary box parameter surrounding the region under consideration

One small demonstration of R-Tree:-

Here Root Node has the entire area and child nodes have boxes within them. So when we query for a particular point, it discards other areas which does not contain that point, thus making it more efficient.

Let’s not deep dive into the working of R-Trees as this is a topic for another day.

Solution 3 (Using Redis):-

Solution 2 mostly solved the problem, but this can be further improved using Redis Geoapis 🔥

Before explaining what is Redis Geoapi, let me show you load testing results of this solution:-

It literally brings down the response time by half. Impressive isn’t it?😎

Redis provides GeoApis where we can add lat/long data and query on that data efficiently.

geoadd restaurants 73.7985838 18.5576988 "Blue Plate"
geoadd restaurants 73.7985838 18.5576970 "Juice Destination"
geoadd restaurants 72.7985838 18.5576970 "Meltish"
  • 1st parameter is the command geoadd to add lat long data in Redis.
  • 2nd parameter is key, in our case it's restaurants.
  • 3rd parameter is longitude.
  • 4th parameter is latitude.
  • 5th parameter is the value associated with that lat/long. In our case, it's the restaurant's name. It can be Id to our data in the table.
GEOSEARCH restaurants FROMLONLAT 73.7985838 18.5576970 BYRADIUS 1000 M ASC WITHDIST
1) 1) "Juice Destination"
2) "0.2370"
2) 1) "Blue Plate"
2) "0.2803"

The above query gives us all the records present in 1 Km of given lon/lat along with their distance in the closest first order. Neat right!!

Redis fetches data in O(N+log(M)) where N is the number of elements in the grid-aligned bounding box area around the shape provided as the filter and M is the number of items inside the shape

Our database of 2 million records can be combined with Redis by saving record id with lat/long in Redis.

Fetching ids from Redis for a particular radius and then fetching data of restaurants from db by querying those ids.

Well Well….

Let me explain in short, how Redis works internally to achieve such impressive results. The magic here is the geohash function with Sorted Sets.

Working:-

  • Whenever a new record is added using the geoadd function, Redis internally creates a hash value for that lon/lat using its geohash function and maintains that hash in the sorted set.
  • To demonstrate, the above restaurant's geo hash are:-
geohash restaurants 'Meltish' 'Blue Plate' 'Juice Destination'
1) "te7c8d663v0"
2) "tek3x666rt0"
3) "tek3x666rt0"
  • Here, as you can see if points are closer to each other then its geohash initial characters are similar. Each character in the geohash uniquely identifies a rectangular cell at varying resolutions.
  • When we use the geosearch function to fetch records in a particular radius, Redis first calculates its geohash and then discards part of the sorted set as its geohash prefix is different, thus reducing a lot of search space.

Well, If you have read this blog till here. I hope it had added some value to your knowledge😉. Thank you!!

--

--