How we built our new suggester using a self-hosted elasticsearch, leveraging the elasticsearch-operator

Published in

Sesame Engineering

9 min readApr 6, 2021

Sifting through the data

Sesame brings together healthcare providers of all specialties from across the US to provide the best and most affordable healthcare. The US is a large place. That’s a lot of providers. With each provider offering a range of services and specialties, how do you find someone who’s right for you? One answer to that question is search. In the piece below we take a close look at our search functionality and how we built it.

Our approach to search is similar to that of an auto-complete suggestion system in which you start typing into a search box and the system shows you suggestions that match what you have typed. The more you type, the more accurate the search becomes, giving you a short feedback cycle as your results are shown in realtime. Once you select a suggestion, you are redirected to the search results that fit that suggestion.

Trying different things

At Sesame, we have tried a couple of different solutions to the problem of search. As we evolve as a company and face new challenges, the solutions we deploy need to change, as well.

Our first solution to this problem was to use a full-text search on a field in a PostgreSQL database. We would feed this database with our entire inventory — and update and delete it as that inventory changed. When a suggestion request arrived via our API, we would query the database and return what we found. At first, PostgreSQL was fast and familiar, but as our dataset grew and the application evolved, the queries became harder and harder to manage. After a while, our dataset outgrew SQL. Plus, we wanted to try some of the new features that elasticsearch provided. So, we decided to replace SQL with our next solution: Elastic App Search, a hosted solution provided by Elastic.

We chose this approach because it seemed to fit our use-case perfectly. It also gave us the ability to tune the result weights as well as use synonyms when searching. Using a hosted service meant that we could spend more time on feature development and less time on cluster management. Ultimately, however, App Search turned out to be the wrong decision for 2 reasons.

The hosted solution was very slow. We would see query times of around 1–2 seconds per query. This might be fine if we were doing a full-text search, but because this was a search as you type approach, we couldn’t expect our customers to wait 1–2 seconds between every letter typed.

Maintenance and reliability were both poor. The hosted service suffered from more downtime than we were comfortable with. Even worse, we found that we couldn’t tune the performance of the application or troubleshoot the downtime.

That’s when we went looking for our third and most stable solution: the system we currently have in production.

Understanding what’s important

Given what we had learned from the previous solutions, we gathered up three requirements for the new approach.

The solution had to be much faster than previous, ideally <50ms per suggestion.
We needed the solution to handle synonyms and typos
We wanted to host the solution ourselves, likely inside of our own kubernetes cluster, because:

We can use our own monitoring infrastructure for this
We have full freedom to tune the application
We can troubleshoot and fix any problems that occur, rather than relying on a sometimes unresponsive third party.

To meet these requirements, we also considered three possible approaches:

A custom-made in-memory trie implementation that would populate the data on start-up.
An in-memory lucene approach that would use lucene as the query engine, which we would populate with data at start-up.
A persistent elasticsearch approach that would be populated with data in real-time.

The in-memory approaches had the advantages of being very fast with no extra network or other I/O calls to perform a lookup. Development would also be easy, as the whole solution would be contained on a single machine. No need to share access to any data. The persistent solution has the advantage of being updated in real-time as our inventory changes. Ultimately, we chose the elasticsearch approach, mostly due to the real-time updates (which we could have also added to the in-memory system), as well as the relative simplicity of the implementation. As elasticsearch gives us almost everything we need (as would lucene, admittedly), we felt like it would be the simplest solution for us to implement, while still fulfilling all our requirements.

A new solution with three sides

Our new suggester is a Java-spring application with three “sides” that work together to deliver fast, accurate, and up-to-date suggestions. One side interacts with elasticsearch by adding data to the indexes, and querying the indexes in response to user interaction. Another side interacts with the REST API, which serves user requests such as providing a suggestion for a typed string. The final side interacts with Google Pub/Sub to receive real-time inventory updates.

Google Pub/Sub

We run a microservices architecture with plenty of microservices. It’s sometimes useful for these microservices to know about the entities owned by other microservices, as well as any changes to those entities. As we’re not just going to give applications access to each others’ databases, we devised a system whereby changes to these entities (what we call entity events), such as creations, updates, and deletes are sent asynchronously over Google Pub/Sub topics. Then, each other service that is interested in a particular entity can choose to subscribe to the topic for that entity and use as much or as little of the entity as it wants. In the case of our suggester, we subscribe to the entity events of the entities we want to be able to include in suggestions. These entities are sent out whole, but most other applications are only interested in parts of them. For example our suggester is primarily interested in names, ids, and geolocations. This lets us update our indexes easily, including removing things from the indexes if the entities have been deleted from the service that owns them. This is similar to the concept of change data capture, but on a level of abstraction just above the database. This makes it database agnostic, which makes it easier for consumers.

REST API

The REST API allows our front-ends to query us for suggestions. It was important to us to make the new REST API backwards compatible with the old one. If we were going to swap out a service that many other things relied on, we needed to make sure that the users of this service needed to deal with as few modifications as possible. Because of this, we created a REST API that was virtually identical to the previous incarnation. This way, the users of the service could have a simple on/off switch for the new service, allowing us to turn it on progressively and turn it back off quickly if there were any problems.

The REST API itself is a GET endpoint that takes two types of arguments: the query to search for and a geolocation context that we use to filter the results. An example call might look like this:

GET /v3/suggest?query=col&latitude=40.7674029&longitude-73.9711192

To which the API would respond with a list of matching hits

As we can see, multiple different types of hits matched the query. All the hits are fundamentally the same in that they show the type, id, and highlighted value of the hit. This lets us render them homogeneously in a list. Additionally, different types of hits can return more metadata as well, such as for example the headshot image of the provider. This lets the front-end render these hits slightly differently, should it want to. The available metadata is indicated by the type in the response.

Each hit in the suggester can be used to trigger a full search of our inventory. When a user chooses one of the entries in the list, we can use the metadata in the result to trigger a full search. For example, if the user chose the second entry in the list above, the front-end might load a URL like https://sesamecare.com/search?complaint=cold_feet, which would perform a search for all providers in our inventory offering services to treat cold feet.

Elasticsearch

For the elasticsearch deployment we rely on the elasticsearch operator. This makes deployment of elasticsearch clusters to our kubernetes cluster a breeze. Not only is it helpful for this project, but we can leverage it for our other elasticsearch clusters as well. Using an operator combines the best of a hosted and a self-hosted solution. You get the expertise of a hosted solution, because the people who normally host the solution wrote the operator. You also get the ability to tune everything yourself, because you have full control over how you parameterize and configure the operator.

Querying the index is done using the search_as_you_type field type. This special field type is native to elasticsearch, and designed for exactly the kind of problem we’re trying to solve. There’s another field type called completion that works similarly. During experimentation, however, we found that search_as_you_type gave us a better experience. When we search, we query this field to find matches for what the client typed, and then filter it using on their geographic location. This ensures that we only show results that are relevant to the user where they are. This is because people prefer healthcare that is available close to them. This is slightly different for virtual healthcare, but for things that need to be done in-person, people will generally prefer things that are close to them.

When we get a set of results, we want to only show the most relevant to the user. As we limit the result set to 5 results, relevance becomes important. We are in the early stages of tuning relevance, but our current solution relies on the scores delivered from elasticsearch. Our current approach is fairly simple. We have data in categories with priority. We start with the highest priority category of data and sort this by the elasticsearch document score. If there’s more “room” in the result (i.e. we haven’t yet produced 5 results), we will take the next priority category of data and do the same thing. We do this until we have produced 5 results, or we run out of results.

So, how did we do?

The new service has been running for a while now, and it consistently responds in under 30ms, which is a significant improvement over the previous service.

The speed difference is noticeable. Search results show up as you type, as opposed to a second or two later. We still have to deal with network latency, of course, which varies from user to user, but the time for each suggestion is no longer dominated by the service itself, but rather the speed of the user’s internet connection.

What does the future hold for the Sesame suggester? In a word: relevance. Finding what you’re looking for is just the first step. We need to make sure that the thing we’re showing you is the most relevant thing, and we need to make sure that things that are more relevant are shown before things that are less relevant. Because we only return 5 results, we have to make sure that the ones we do return are not only relevant, but the most relevant, and also ordered by relevancy.