Another Azure Search Workaround: Case Sensitivity

Maria Gullickson
CarMax Engineering Blog
4 min readAug 5, 2024

I posted previously about some work we had to do to get searches working for missing data in Azure Search. This post is about another bit of functionality that didn’t work for us out of the box in Azure Search, what we considered, and how we got things working as we needed.

On the CarMax website and mobile apps, we tend to do a lot of fairly structured searches, rather than free-text searches. Customers are shown a collection of facets that list vehicle makes, drivetrains, features, etc. to choose from. These values are then used to filter down our current saleable inventory. For the most part this works as we want it to. But to make our API as usable as possible to internal customers, we wanted to make queries as forgiving as we could.

In particular, we wanted to allow clients of our API to search for Make=“Ford” or Make=“FORD” or Make=“ford” and get the same set of vehicles back. Unfortunately, this is not supported by Azure Search. A filter on a text field needs to be an exact match, case and all. Make=“Ford” would return the right results, but Make=“FORD” and Make=“ford” would return nothing at all.

In this post, I want to share the solutions we considered and the one we landed on, because this seems like an issue that is likely to impact other search experiences in other organizations.

The Simplest Solution: Lowercase All the Things

The most obvious way to work around this is to simply lowercase all of our data and all of our queries. This works great to filter down to the right set of inventory.

But we also want a pretty version of the data to display to users. We need this in two places. One is simply in showing data about individual cars, and the other is in the list of facet values we allow users to select from.

We can’t simply convert those lowercase values to Title Case (capitalizing the first letter of each token) when displaying them. That wouldn’t work for all of our data. For example, it would produce poor results for vehicle models like Honda CR-V, Toyota RAV4 or Mazda CX-5.

The Elegant Solution: Use Azure Search Normalization

Looking at Azure documentation, I discovered normalizers. Their documentation led me to believe they might have solved this problem for us.

You can assign a normalizer to any text field. This defines a method to normalize the field for indexing. You can define a custom normalizer if you need to, but one of the pre-defined ones is “lowercase”. Using this, your document is still stored with the original casing (e.g. “Land Rover”), but the normalized version (e.g. “land rover”) is what gets indexed. When filtering on this field, your filter text is normalized in the same way.

This successfully solves our filter problem. Now filtering for Make=“Mazda” or Make=“MAZDA” or Make=“mazda” will all return the same set of cars for us, the Mazdas. And because the primary document hasn’t changed, what’s returned on the vehicle to show the user still says “Mazda” like we want.

Unfortunately, it doesn’t solve the faceting portion of our use case. Those normalized values are used when getting facet counts. It makes sense why they did this. It’s to handle the situation of several documents with different casing. For example, let’s say there are 100 “Mazda” vehicles, and one “MAZDA” that snuck into our data somehow. When I get facet counts for the Make property, I want to get a single Mazda count of 101. To do this, they count by that normalized value of “mazda”, and so that’s what’s returned in my facets with a count of 101.

This means the normalizers won’t work for us. When showing the set of facet values for customers to choose from, we want them to see “Toyota” and “BMW”, not “toyota” and “bmw”.

The Functional Solution: Double Down on Our Data

What we ended up doing is having two copies of every string field that we want to be able to filter on:

  • One field is used for filtering. This field is lowercased. It is created with IsFilterable=true, IsHidden=true.
  • One field is used in places where it might show to users. This field uses the original, user-friendly casing for the data. It is created with IsFacetable=true.

This changed how we work with these fields a bit:

  • When adding new data properties, we have to consider whether this is a case where we’ll need the double fields.
  • When ingesting data about new vehicles, we need to create two copies of the data for each of these fields.
  • When generating Azure queries, we need to consider which field to reference for searching, faceting, selecting, and sorting.

That said, now that the pattern is established and works pretty much the same for all fields, it hasn’t been hard to keep up with.

--

--