Offloading Database Queries to elasticsearch

Florian Hopf
Ninja Van Tech
Published in
6 min readJun 10, 2021

PUDO points are locations where customers can pickup or drop off parcels, be it shops or parcel lockers. The PUDO team at Ninja Van is responsible for different services to support the business in providing this service to end customers. Our core application is Ninja Point which is a web application used by our retail partners and dedicated parcel stores.

In this post I will describe how we improved usability of a part of the application by switching queries from MySQL to elasticsearch.

Order History

To help users identify which parcels are supposed to be in store or which have been processed already we provide a list view of orders, where users can filter by date range and status of the parcel. We also display the count for the different status filters and allow users to search by tracking id.

Order History in Ninja Point

When the feature was built it was decided to do most of the filtering, sorting and searching in frontend. Backend would return orders for a certain date range in batches, fetched from a MySQL database and enriched with more data from another service. Frontend would then display all the records in a long list, count the total and the status for the different orders to display them in filters and, if a filter is selected or a search term entered, filter out any orders that do not match that filter. A CSV export worked in the same way, just generating the CSV on frontend from the data in the table.

This was working great as long as the data set was rather small (e.g. for our stores in Singapore). But some of our stores (e.g. in Indonesia) have to take care of a lot of parcels at once and the history page then would have to display thousands of parcels. Getting all the data from backend could take quite some time, with cases of users waiting for up to 15 minutes to fetch thousands of records to later only display two entries in the table, the rest all being filtered out.

This solution clearly wasn’t scalable so we were looking to improve the user experience, especially in regards to page load time. Google Analytics helped us to get some insight in how long rendering the list would take so we had a clear metric to improve.

19s average load time in Indonesia — some room for improvement

It was obvious that we had to move most of the processing to the backend to allow for reasonable page load times. Also we wanted to move to a paginated solution so that we wouldn’t have to fetch all data at once. A potential solution would have to fit the following requirements:

  • allow for easy pagination
  • allow to query by date range and status of filters
  • sort the data according to different columns
  • provide counts on the data to be used for the filter labels
  • allow for prefix and suffix matching on the tracking ids

We were thinking about a few approaches how to achieve this with MySQL but especially the last requirement, even though doable, left too many doubts.

Introducing Elasticsearch

When you have the requirement to do some search on text the rather obvious solution is to use a full text search engine. Elasticsearch is the go to search engine right now and some team members were already familiar with it. Luckily it’s also already used by other teams in the company so our platforms team has plenty of experience in operating it as well.

To get data in elasticsearch we are using Maxwell, a non invasive tool that is also used by other teams in the company. Maxwell listens to the MySQL binlog and pushes all changes to a Kafka topic. A simple process can listen on this change stream and push the data to elasticsearch.

Maxwell decouples indexing from the application

Using a change capture data solution like this has the nice property that you don’t need to worry too much about keeping your search engine in sync with your primary data store. You only need to process all change messages, no need to do any writes to elasticsearch from your primary application.

As we would always only display orders that are assigned to a single pudo point we also used the routing feature of elasticsearch to put all data for a point on a single shard. Even though this has the danger of forming hot spots (shards that get a lot more data assigned than others) it allows you to have exact counts for aggregations which you can’t rely on by default.

Rolling It Out

For the migration we decided on an incremental approach. Because we had a working page already it was fine to introduce elasticsearch in the background and replace parts of the functionality with it. At first we just wanted to set up the indexing process and choose the smallest feature possible to implement. We decided that in a first increment we would only replace the total count of the results to come from elasticsearch, keeping the rest of the functionality as it is.

The smallest possible change — count of results for the current query

This allowed us to start minimal — we would just require the necessary data for searching and filtering to be available with no need to display any data from elasticsearch besides the count. It also allowed for an easy rollback in case of issues — just revert the frontend change. And finally this count was a bit annoying because due to the nature of how the data was fetched it would not show the correct amount at the beginning but instead the count was rising whenever a new batch of data was fetched from the backend. We figured it would be a good improvement for the users.

And we were right, the behaviour was improved. But there was a side effect we didn’t account for. It turns out users used the changes of the count (which we considered annoying) to determine when the list is fully loaded. With elasticsearch the final count was visible immediately … so users would assume loading is finished already and press the CSV export button. But because this export was still taking whatever was rendered in frontend it would export partial data.

We were able to quickly solve this issue by blocking the CSV export button to only be enabled when all data was fetched. But it also clearly showed that we did not understand the behaviour of our users fully in advance. Something to improve on.

Success

Incrementally we changed the rest of the page to fetch data from elasticsearch until no database queries were involved anymore. Due to the limited scope we were able to roll out this change in small steps without introducing too much risk. And in the end page load times improved dramatically by nearly 20 times, leaving us with far happier users.

Once we had the data in elasticsearch we also started to see more usecases across the application. For example we are now also providing users with a tracking id search on the homepage that they use to service the shippers better.

Even though there is additional structural complexity involved moving to a data store that is a good fit for your queries is often worth the investment.

--

--