OASIS: Zocdoc’s Search Ingestion System

Published in

Zocdoc Engineering Blog

9 min readApr 10, 2019

Why is Search important?

Zocdoc recently celebrated its 12th birthday. Over those years, engineers at Zocdoc have developed our product into a feature-rich application to provide better experiences for both patients and providers.

In the process of developing our platform, we have implemented a wealth of features such as insurance card capture using deep learning and used data science to shorten patient wait times. We have also re-architected our services to scale along with our growing number of users, and as a result, transitioned our monolithic application to various microservices.

There is another feature that has changed substantially over this time period that we would like to highlight in this post: search. What is the use of the platform if users can’t independently search for doctors who suit their needs? This post will be the first in a series about search at Zocdoc. We will focus on how we ingest and index the information necessary to surface relevant search results.

The importance of the search feature is central to any application. Let’s say you are currently in Central Park and using an app to find places to eat. What do you expect to see when you type “quick cheap lunch nearby” into the app’s search bar? The query “quick cheap lunch nearby” sounds deceptively simple, but it contains keywords that would involve a fair amount of work for search algorithms:

Among hundreds of restaurants around your search location, the word “cheap” would need to filter out restaurants according to their price range.
The word “lunch” should prune results that, of course, do not serve lunch. This could be done by manually looking up their business hours or adding a flag that indicates whether the restaurant serves lunch.
Given the keyword “quick,” the algorithm should prioritize fast casual options over restaurants with full-course lunch menus or factor in “average waiting time.”
The word “nearby” should narrow down the results using your current location, but not so naively by including every restaurant within the same radius. For instance, if your search was made from Central Park, NYC, your straight-line distance could be closer to New Jersey than New York’s own Union Square, but that is probably not what you meant.

In the end, any well-implemented search service needs to effectively filter out irrelevant results among all user-generated and retrieved data and then organize the results for relevancy. If not, even though you’re in Central Park, an unsophisticated search feature could surface restaurants in New Jersey because, understandably, they are closer (“nearby”) and less expensive (“lower price”).

At Zocdoc, the quality of search is important because it is often the first entry point for patients looking for a doctor. For many first-time and returning users, the quality of search results determines the quality of their Zocdoc experience, and quality search results allow them to independently find and select different types of doctors that are relevant to the criteria they input.

For example, if a patient wants to book an appointment with a dermatologist (“specialty”) within 3 days (“availability”) near Flatiron (“location”) who speaks Spanish (“language filter”) with an in-network insurance (“insurance coverage information”) with trustworthy service (“ratings,” “reviews,” etc.), that’s already six pieces of information that will inform the patient’s search results. The search team’s job is to optimize our algorithm and data retrieval to maximize relevancy for each user’s unique request.

Indexing

So how does search work at Zocdoc? When a query is entered into the search bar, it should seem like a user is instantly getting back results in real-time. What’s allowing this fast retrieval is “indexing,” which happens before the actual search begins.

An analogy will be helpful here. Think about how you organize your books. One inefficient option would be to simply stack them in one corner of your room, but you would have to dig through the entire pile every time you look for a particular book.

Another option is to come up with a few criteria with which to categorize your books. Those criteria might be:

Alphabetical order by book title
Group by author’s last name
Genre
Publishers

Let’s say someone asks you for science fiction novels published by Random House Books. Unless your books were categorized in advance, there would be some delay in retrieving books meeting those requirements. Likewise, the search team needs to categorize and shape the model of our data to quickly deliver accurate and relevant search results. This kind of preparation work that comes before search is called indexing.

Elasticsearch

At Zocdoc, we use Elasticsearch — a popular distributed search service used by a host of companies. Netflix heavily uses Elasticsearch for its data pipeline. When you are searching for articles in Wikipedia, it is Elasticsearch’s full-text search feature that allows you to see the results almost instantaneously.

But beyond the typical search functionality provided by other search engines or document stores, Elasticsearch offers many more highly-customizable features. For example, when we create an index, we can configure it to use an analyzer that allows for autocomplete. Elasticsearch also highlights certain keywords when users are entering their queries. This allows users to see other query terms most often related to their keyword (based on other users’ inputs and selections) that could lead to more relevant search results.

Because Elasticsearch is also a distributed document store, we use it to store data about doctors and their metadata. “Ingesting” this information into Elasticsearch, however, proved to be a task that involved consideration from various engineering perspectives.

Ingestion

Filtering and sorting doctors for search requires recognizing numerous attributes per doctor, such as availability, location, and insurance. These features can change continuously, so our Elasticsearch index needs to be updated for every change.

For example, a doctor who had one available slot left ten minutes ago might have been fully booked by the time a user decides to make an appointment. A psychiatrist whose clinic was located in West Village a year ago could move to Flatiron. If this information isn’t updated accordingly, a doctor could end up double booked or a patient could go to the wrong location.

These pieces of data are produced by other microservices around the company, which provide different types of information like booking status and doctor metadata. This data needs to be collected, transformed, and ingested into our Elasticsearch cluster to make it available to the search service.

First attempt: Batch Ingestion

In our first attempt to ingest data into our Elasticsearch cluster, we used Apache Spark, a distributed processing system for big data, on Amazon Elastic MapReduce (EMR), which is a hosted environment for Apache Spark. The job first collected data from many sources, then ran the required transformations on them before finally indexing all of this data into the Elasticsearch cluster. The job ran on scheduled intervals, indexing documents into a new index on each run. When the run had completed, a Spark job would make the newly ingested index available to the search service.

While this approach allowed us to get the new platform off the ground quickly, there were several shortcomings:

Not real-time: The data that search uses to sort doctors is changing constantly. As mentioned before, when time-sensitive information like a doctor’s availability changes, we want it to be reflected immediately in search results. Considering the amount of data we were ingesting with every run, the Spark job took a significant amount of time. By the time the ingestion was complete, the data was already out of date.
Resource intensity and latency: Because we were updating all doctors in the index during this job, the final indexing step turned out to be quite taxing to the Elasticsearch cluster. While the job was performing the indexing step, substantial memory and CPU had to be devoted to indexing the new documents, rather than handling users’ search queries. At that time, we used a workaround solution by indexing to an offline cluster and then used Elasticsearch’s snapshot mechanism to get it into the cluster used in production. This approach solved the performance issue, but added some delay in the data going live.
Maintainability: Because of the distributed nature of the data aggregated by the Spark job, the code base for the job had too many distinct responsibilities. The data it was collecting came from numerous sources, and the job had to know how to deal with all of them.
Clean-up: Because indexing in a batch manner requires indexing into a fresh index, each time the job ran we were creating new indices. This required regular cleanup of the cluster to remove old and outdated indices in order to free up space.

Second attempt: Streaming ingestion

After using batch ingestion, we quickly felt the need to move to a long-term solution. This was accomplished through a platform we call OASIS (Online Active Search Ingestion Streaming). OASIS is an event-driven ingestion system, which updates properties of individual doctors in our index when they change. It consists of a set of AWS Lambda functions and SQS queues which are triggered by doctor features from a variety of other services at the company. Unlike the batch ingestion model, we are able to respond immediately when information about doctors changes. And because we are updating a single doctor at a time, ingestion operations are much less taxing to the cluster.

The OASIS platform consists of event producer lambdas which listen to streams from various doctor-related data sources. For example, availability-lambda listens for changes in availability for providers and raises an event upon changes, such as when a new slot opens or an available slot is booked. These events are all put onto a central Kinesis stream. This Kinesis stream allows us to control the rate at which events are coming into the system.

The Kinesis stream has another lambda listening to it. This lambda reads events and separates those events into priorities based on their topic. These priorities help us separate events which must be reflected in search immediately — such as a doctor being deactivated and therefore removed from search — from lower priority changes — like when a doctor adds a new language spoken at the office — that could be processed subsequently. As a result, availability changes are updated in search within seconds, and other changes might take up to fifteen minutes. These prioritized events are then put onto low and high priority queues.

From there, a consumer lambda per prioritized queue listens for events and indexes them into Elasticsearch when they occur. An optimistic concurrency scheme prevents race conditions from producing inconsistent data.

The streaming model allows to us to solve the problems presented by the batch model:

Changes in doctor features are reflected immediately in the cluster (with an acceptable delay for low priority events).
Each indexing call consists of a small batch of recently updated doctors, meaning that we avoid any indexing-related performance impacts.
Each lambda is concerned with only one datasource, leading to well scoped responsibilities. This in turns leads to more maintainable code.
We are able to write into the same index which is actively being used for serving search queries, meaning there is no cleanup of older indices required.

After we index the documents to Elasticsearch, we need to check for errors in the response. For example, errors can occur when two consumer lambdas are trying to update the name and location of a doctor at the same time. If the first consumer succeeds in changing the doctor’s name, the second consumer lambda will fail. However, its job won’t be deleted from the queue but retried automatically.

This is how we use Elasticsearch’s Optimistic Locking to prevent conflicts between two lambdas. In the diagram below, Lambda B is fetching data just before Lambda A fetches up-to-date data and indexes faster than Lambda B. Then, Lambda B tries to overwrite with old data, gets rejected, and only succeeds after a retry:

Note that what we covered so far, the ingestion service, is only the initial step in how search works at Zocdoc. Stay tuned for future posts about our choice of language — TypeScript — and how we sort and score returned results to maximize patient relevancy by leveraging machine learning.

About the Author

Sheon Han is a Software Engineer for the Search Team at Zocdoc. He thinks functional programming is pretty cool and pasta should be cooked al dente. We can also find him on his website.

Originally published at www.zocdoc.com on April 12, 2019.