Building a search microservice with Machine Learning

Published in

Loopio Tech

7 min readApr 14, 2023

At Loopio, we rely heavily on offline experiments to test new hypotheses that can help us improve our users’ search experience. When an experiment is successful, our goal is then to bring it to production as soon as possible so that we can gather learnings through A/B testing on a subset of our users.

Read all about the details of our offline experimentation framework, and how we use Machine Learning (ML) to improve query understanding in Part 1 and Part 2 of our Search Series.

In the past, it was not always easy to bring those experiments to production. We understood that using ML would take the search experience to the next level because keyword searches are very limited and are not context-aware. With our previous search infrastructure, we identified opportunities that would allow us to maximize our search experience.

We decided to invest in a new search microservice, since ML solutions are typically developed in Python, and our legacy system was in PHP. The goal of this investment was to create a service that is more performant and makes integration with ML a lot easier. In this blog post, we will share what this new architecture looks like, and some of our learnings.

Limitations of the legacy architecture

In the legacy architecture, our main web application is connected with Elasticsearch to fulfill the search functionality. Elasticsearch is an incredibly powerful tool for indexing and searching data, but we found that as data volumes grow, indexing performance started to become a bottleneck. Whenever there were changes in the index settings, all documents needed to be re-indexed, spanning a few million documents and taking hours to complete.

Since the existing web application is built in PHP, which is limited in ML solutions and support, adding ML models to the process was cumbersome:

It would be tough to apply ML models in the search workflow: Results from the Elasticsearch query would need to be retrieved and fed to the external API that contains the integrated ML models. The results that are returned from the API would then be displayed in the UI
It would also be challenging to handle indexing: The external API would need to be accessed to apply ML models to the data before re-indexing

Defining requirements for the new solution

As described in our previous article, we rely heavily on ML experiments for search. So the main requirement of our new infrastructure was simply the ability to quickly integrate ML in the search life cycle. We wanted to have the ability to use ML in three stages:

Indexing: When a document is being indexed, we want to extract key phrases, identify entities or Part-of-Speech, and also generate embeddings that unlock the ability to use Semantic Search or the ability to match documents beyond keywords
Query: We could use query understanding or the ability to expand the query using synonyms, fix spelling errors or simply predict where the most relevant document would be based on the search terms.
Post retrieval: Once a set of documents are retrieved, we could re-rank them or even extract the exact answer that the user was looking for.

The figure below shows how a new feature can be introduced at different stages of a search system.

Due to our indexing use cases, we also needed to support fast and high-performance indexing whether it was a single document being processed or millions.

We first set out to explore the advantages and disadvantages of different solutions. Some of these solutions were also new to us so we relied heavily on quick proof of concepts to help answer questions and guide our decisions.

Based on this work, we decided to build our service based on the following stack:

Elasticsearch would still be our main search engine
Search requests will be served on a new Python API using the FastAPI framework that reads documents from Elasticsearch
Indexing documents to Elasticsearch will be performed using an event-driven approach based on AWS Kinesis and Lambda.

The Search API: based on FastAPI

FastAPI is a modern, fast, and high-performance web framework for building APIs in Python. We decided to use FastAPI primarily because of its performance, concurrency, and ease of use. The Search API works as follows:

The main endpoint of this API is the /search endpoint which accepts a search query and a set of filters, performs a request in Elastic search, and returns the results.
The API currently uses an ML model for query understanding. When a request is received, the search query is first fed into a model to generate some metadata. Those are then used in the final Elasticsearch DSL query.
The API also supports feature flagging which allows us to easily bring an experiment into production and perform A/B tests. Experiments can easily be enabled or disabled, allowing us to run experiments in a safe and controlled environment.
The new API also makes it easier to now use ML models in the post-retrieval step.

The Search Indexing pipeline: powered by AWS Kinesis and Lambda

Our indexing data load is variable and we did not want it to impact search requests. Therefore, we decided to decouple the indexing functionality from the Search API. We opted for an event-driven approach. When a document changes in our platform, it would simply be sent to a message queue and processed immediately.

We decided to use Kinesis over other queues such as SQS and Kafka because it is managed by AWS, can support a higher data size, and is cost-effective. SQS was our initial choice but because of the message size limit of 256KB, we switched to Kinesis.

One thing to keep in mind when using Kinesis is the partition key. Kinesis uses sharding to process events in parallel and in our case, we need to support a case where the same document was being updated at the same time. This is where using a good partition key helps. We used the document id as the partition key which means that the order of updates would be maintained based on which event was sent/accepted first.

Lambda was the perfect companion to Kinesis because it can easily scale to support the data in Kinesis. Another benefit of using Lambda with Kinesis is that Lambda has a built-in retry and error handling mechanism. If a Lambda function fails to index a record, it can be retried automatically. Since Lambda also supports images, we can simply package all our dependencies and models in a container.

Lambda works well for smaller models that can be loaded in memory or do not require the use of GPUs. For larger models in the future, we can simply host it on an external service (e.g. Sagemaker) and perform an HTTP request to use it.

Below is a simplified version of what the new architecture looks like

This new indexing solution now allows us to easily integrate ML at a large scale. We can now use ML to extract key phrases from a document, identify entities or Part-of-Speech, and also generate embeddings that unlock the ability to use Semantic Search and the ability to match documents beyond keywords.

Key Learnings from Building a Search Microservice

Our new search microservice is now faster, more reliable, and more flexible because we can easily bring new experiments to life. We did learn a few key things along the journey:

Set Service Level Agreements (SLAs) and testing: Before deploying the service for the first time, run some performance tests and define an SLA for how long a search request should take or how long before updates are fully indexed. This is important because we do not want the addition of new models to degrade the search experience. Every change to the Elasticsearch Index or the introduction of a new model should be benchmarked against the current SLA.
Use Feature flags: Feature flags are a great way to mitigate risk because they allow you to easily turn on/off features. Feature flags can also be used to gradually roll out changes to customers or perform online A/B tests.
Consider the user experience: Changes from synchronous to asynchronous updates can affect the user experience, especially in an existing system because changes may not be instantly visible. It is very important to keep that in mind when switching to an event-driven approach.

Conclusion

Our journey of improving search has led us to the creation of a new search microservice that is not only faster and more reliable but also enables seamless integration with ML solutions.

We have built a scalable and flexible solution that allows us to quickly bring new experiments to life and enhance the search experience for our users. As we continue to learn and iterate, it is essential to maintain a strong focus on performance, risk mitigation, and new ways to improve search.

This concludes our search series for how we use Search at Loopio. We are continuously on the lookout for talent, so check out the career opportunities available across Loopio’s Engineering, Product, and Design teams.