Searching for care with simple terms

How we built a more accessible search function to help people find the care they need

Last month, our members conducted tens of thousands of searches on our mobile and web applications, turning to Oscar for advice on which doctor or facility to visit or for information on prescription drugs. Members trust us to guide them to great medical care options, and part of that trust stems from us providing easy-to-use search tools.

Easy-to-use is not easy-to-make, though. Insurance companies have traditionally struggled with building search functions that are easy to use, in part because the underlying data feeding into search is filled with terminology that requires a medical degree to understand. We need to understand medical terminology, but we don’t think our members should have to in order to find quality care. We also have a lot of data that helps us make educated guesses as to what our members might be searching for. For example, when a member searches for “diabetes”, we know, in part from our claims data, that Diabetes Type 2 is more common than Diabetes Type 1. We can use this knowledge to inform the displayed search results.

In building out our search function, the goal was to deliver an experience more in line with what people expect from a tech company than their health insurer. As a result, we now have a simple, accessible, and performant search function.

What Reason for Visit search looks like

One of the most used types of search is Reason For Visit, which allows our users to search for care based on the health issue they’re experiencing or procedure they’re looking into. Members are able to type layman’s terms like “tummy ache”, into our search bar, and we guide them to the right care from there. As the member types “My tummy hurts”, we let them choose whether “Abdominal pain” or “Stomach disorders” is a better fit for the pain they’re experiencing.

Once the member chooses one of these Reason for Visit terms, we explain a few choices they have for care. In the case where a member selects “Abdominal pain”, the member could have a free Doctor on Call consultation by phone, see a primary care physician, or visit an urgent care center.

Once a member selects a type of care that best fits their particular issue, such as primary care doctor, we display a list of doctors they can see, ranked by how well each doctor meets the member’s needs based on geographical proximity, appointment availability, and a number of other compatibility factors.

The intuitive mobile UI sits on top of a powerful backend search service. This same Reason for Visit search backend enables a number of different search UIs, including iOS and Android applications, public and members-only web applications, and internal tools used by our Concierge teams that guide members towards the best possible care over a phone call or via a secure text message.

Tailoring our search terms to user needs

The first part of a great search experience is complete and well-structured underlying data. We store a set of medlines — conditions defined by the U.S. National Library of Medicine — with unique IDs, user-facing name strings, user-facing descriptions, and a large list of synonyms. We also store information such as how common each medline is (based on our data scientists’ analyses of Oscar insurance claims as well as third-party claims), and which types of specialists (e.g. psychiatrists) and facilities (e.g. urgent care centers) can treat each medline.

A number of Oscar employees, including on-staff physicians, product managers, and engineers, have worked together to augment public data and customize it to what we know Oscar members need to be able to find. Over time, we have expanded and modified this data, often as people bring their diverse viewpoints to the team, or as we audit our search logs and notice members have had trouble finding something.

For example, at one point we noticed that our members were trying to search for the term “Birth Control”, but we weren’t giving them the full set of results they might need. Today, you can search for “Birth Control” along with similar terms like “Mirena” and get a full list of birth control options to choose from.

Some time ago, we also scanned the search analytics and noticed that members were frequently searching for procedures such as “MRI” and “Mammography” that we didn’t support in our search bar. We added these as reasons for visit, and today “Mammography” is the 5th-most searched Reason for Visit, and “MRI Scan” is the 15th-most searched Reason for Visit.

Storing data in performant, fault-tolerant data stores: RDS and Elasticsearch

We permanently store all of our search data in Amazon RDS databases, which are regularly backed-up for fault tolerance. As we make updates, we pull the data into temporary Elasticsearch indices. Our Elasticsearch Reason for Visit index holds JSON documents, one per medline, containing all of the medline’s associated data. The Python job that moves data from RDS to the Elasticsearch indices uses a few helpful libraries, including:

  • The Elasticsearch DSL library, which provides Pythonic syntax for defining and querying documents.
  • SQLAlchemy, which provides Pythonic syntax for defining and querying SQL tables.
  • The marshmallow library, which permits us to serialize Python datatypes into complex objects.

When we define the Reason for Visit index, we tell Elasticsearch to expand the medline names and synonyms into various data representations that will help us provide a better and more performant search experience for users. In addition to storing a full Reason for Visit name like “diabetes type 2”, we also store tokenized names like “diabetes”, “type”, and “2”, as well as stemmed names like “diabet” (that will help us match to typed queries such as “diabetic”), and different sizes of edge n-grams, like “d”, “di”, “dia”, “diab”, etc. (that will help us match queries as a user types individual characters). For synonyms (such as “insulin”, “blood sugar”, and “mellitus”), we store only the full names and stems. We also store stopwords, or words to ignore within user queries, such as “for”, “always”, and “it”, so that we can focus on the most relevant parts of the query.

Retrieving the right pieces of data

Besides data storage, the other major piece of a search backend is the retrieval process. We implemented a Reason for Visit search service to manage retrieval, which has two benefits:

  1. Query work is abstracted from our many search clients (logged-out web search, logged-in web search, iOS search, Android search, and a few types of internal search) and into a single location — — the service. Besides reducing the amount of code to be written, this also helps us with division of labor and specialization: frontend engineers can focus on building a beautiful and intuitive user interface, and our backend team can focus on constructing efficient and flexible query execution.
  2. A service also allows us to add a level of abstraction to the requests and returns. In other words, it’s an API. Because they have access to this service, clients are not dependent on the exact structure of the Elasticsearch index, giving us forward flexibility to make changes to the way data is stored without worrying about breaking clients.

Our service is implemented with Apache Thrift and the API is defined in a client- and service-shared .thrift file. Inside the service, which we implemented in Python, the primary goal is to translate API requests into Elasticsearch queries, and to translate Elasticsearch responses into API responses.

Our API has a few different request types, including simple lookups (given a medline ID, retrieve a medline name and description) and typeahead suggestion searches (given a user query, return a ranked list of possible medline names). There are also a number of different filters available (e.g. a maximum number of results to return, and an option to return member-invisible reasons for visit that only internal power users would want to see). As a result, the service has to be able to generate many different Elasticsearch queries. Here is a somewhat simplified client request to the service, generated Elastic JSON query, and service response to the client:

// API Request
request = ReasonForVisitAutocomplete(
query='Coughing'
)
// Elasticsearch query generated by reason for visit search service
{
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"multi_match": {
"fields": [
"name.autocomplete^2.0",
"bag_of_words.nocomplete^1.0",
"name.stemmer^3.0",
"bag_of_words.stemmer^2.0",
"name.nocomplete^2.0"
],
"fuzziness": 1,
"prefix_length": 3,
"query": "Coughing"
}
}
]
}
},
"functions": [
{
"script_score": {
"script": "doc['prior'].value * 20 + 1"
}
}
],
"boost_mode": "multiply"
}
},
"min_score": 0.4,
"from": 0,
"size": 8
}
// API Response
response = [
ReasonForVisitHit(
rfv_id=u'1608', name_highlights=None, name=u'Sore Throat
),
ReasonForVisitHit(
rfv_id=u'4748', name_highlights=None, name=u'Chronic Bronchitis'
)
]

Keeping it simple with special logic

In order to build the different Elastic queries, we use the aforementioned python Elasticsearch DSL library, which allows us to cleanly and methodically build queries as python objects rather than writing raw Elasticsearch JSON queries. To translate the API requests for typeahead queries, we added special logic to pull good matches from the search index. The basic function of this is to take what users type out in the search bar, and translate it to the corresponding medical condition we have in our database.

Part of that logic rewrites the user-inputted query: stopwords are removed so that user queries like “I have diabetes” are reduced to simply “diabetes”. We also apply a fuzziness factor to the query to allow a small edit distance between the input query and a medline name, within which we still consider the query and result to match. That is, the incorrectly spelled “diabetas” matches to “Diabetes Type 2” as well as does the correctly spelled “diabetes.”

Additionally, we make sure our elasticsearch query considers matches against all of the different data stored in the index, including the full medline name, different lengths of medline name character n-grams, medline name stems, full synonym names, and synonym stems. Another piece of our query is a boost for conditions that are more commonly found in claims. For example, we suggest the more general maladies like “Skin Infection” and “Ear Infection” ahead of “Giardia Infection” and “Salmonella Infection” when a user searches for “infection”, because we know from claims analysis that skin and ear infections are more common.

The last part of our typeahead Elasticsearch query is a minimum match threshold. Each of the different types of matches described above are scored using the TF-IDF statistic, multiplied by a custom-defined boost factor, and summed to produce a total score. Any documents below our minimum match threshold are discarded to improve precision — that is, so that we don’t return too many irrelevant results to users.

Finally, post-query, we check if we have any very high scoring results as defined by a custom cutoff score, we raise the minimum match threshold rate even higher. A very high score indicates that we have a perfect or near-perfect match, and upping the minimum match threshold when we see a high scoring result allows us to return exactly one medline suggestion to the user in certain cases.

Weighting the search results

For all of the different match types described above, we needed to determine a reasonable boost factor (how much to weight each match factor versus every other), and we also needed reasonable fuzziness factors, minimum match thresholds, and cutoff scores. These are a lot of variables to determine manually, so we explored a range of values for each, and compared the resulting mean average precision on a golden set of queries and responses. The golden set is a CSV containing queries followed by ranked expected responses, which we generated based partly on historically common searches and partly on our intuition of what types of queries might be challenging to match. Here is a sample snippet:

<query>, <expected result 1>, <expected result 2>, …
“Eye”, Eye Diseases, Eye Injuries, Eye Infections
“I need a wellness visit”, Health Checkup
“Chest pain”, Chest Pain, Heart Attack’

Mean average precision is a commonly used search rank evaluation that considers how many correct results have been seen, divided by the total number of results seen, across all queries in a golden set. We used a linear regression to learn the best values to maximize mean average precision, and then defined those values as constants in our query building source code. We discovered that different values work better for long query strings than for short query strings, so we also customize the weighting constants based on the length of the input query.

Looking ahead

We already have a couple of ideas for future improvements. First, now that we’ve collected a large number of historical user queries, we would like to semi- or fully-automate the use of this data to improve the relevancy of results shown to users. To do this, we plan to regularly query the most commonly searched reasons for visit, and boost these in our results. We also plan to set up logging for zero-result queries so that we can determine whether we need to add new reasons for visit data to serve these requests, or whether we need to better optimize our existing data or our query generation.

Next, we’re considering building a multi-entity search bar. Today, members must search for reasons for visit, specialties, and facilities in separate search bars, but at some point we would like for our members to be able to search across these entities in one search bar, and possibly to support more complex searches like “best orthopedist for rotator cuff tear at Mt. Sinai hospital.” This would require a query parsing layer in front of our search service that performs named entity recognition to help us figure out which Elasticsearch indices (i.e., condition index, doctor index, facility index) to query, and whether we need to auto-apply filters that today are only manually settable by users. It would also require a re-scoring algorithm to help us sort results returned from the different indices.

Even as our search function evolves and grows more powerful, we remain focused first and foremost on usability. Oscar is committed to maintaining a simple user experience that works seamlessly, built on top of powerful and performant backend technology that continually improves.

The assets in this post were created by Gabe Schindler.