Heating up word2vec: BlazingText for Real-time Search

Evan Harris
Building Ibotta
Published in
8 min readFeb 4, 2019

On the machine learning team at Ibotta we try to strike the right balance between utilizing cutting edge technology and chasing shiny objects. There is always a new framework or package being released that offers a new way to solve a familiar set of problems. Many of these can be distractions, but occasionally something comes along that really does check all of the boxes. For us, a recent example of that is running BlazingText on Amazon SageMaker for text document classification.

The following outlines our implementation, and how we utilized BlazingText to add value to a critical feature of Ibotta’s mobile app.

Searching for Relevancy

Search relevancy is a really interesting problem. We hadn’t spent much time thinking about it on the machine learning team until recently. Since then it has piqued our interest not only about applications and algorithms for machine learning in search relevancy, but infrastructure for efficient deployment and maintenance of solutions we might come up with.

Here is one basic problem our search relevancy effort has faced. Classic techniques for search work great when users search with keywords and phrases that match the language used to name and describe content in our app. For instance:

Search results for “bread”

A search for “bread” is straightforward. We have highly relevant content that matches this search query using a simple text matching algorithm. We use Amazon Elasticsearch and its out-of-the-box features for this. But often users might search for words or phrases that are closely related to content we have in the app, but the language used doesn’t match any of the descriptive text we have associated to content in our searchable data store. For instance:

Search results for “zinfandel”

Here is a search for “zinfandel” where the user is looking for a specific grape, but the wine offers in the app are typically for any variety of wine sold by the appropriate label. Our search engine knows that “zinfandel” is wine and is therefore able to return offers for any variety of wine. This experience is so much better than a dead end (“Sorry, no results”) for the user. The user can consider any of the wine labels we carry offers for and either buy an appropriate bottle of zinfandel or choose something related like another grape variety. This drives satisfaction for our users as well as more engagement for the content in our app.

So how do we make this happen? There are a few things to keep in mind about the problem space. First of all our inventory revolves rapidly, so indexing this data (loading it into our searchable content store) needs to happen continuously. Additionally, we want the indexing process to be as automated as possible. And finally, we don’t want to index a ton of data along with each piece of content in our searchable data store. Here are some potential solutions:

  1. For every search keyword we can think of, hard code content that should be returned given a search containing that keyword
  2. For every piece of searchable content, hard code search keywords that we might expect users to search for that should return that content
  3. Utilize a synonyms file with hardcoded mappings of keywords used to associate potential user search keywords with content
  4. Categorize content at setup time, categorize search queries at query time, and match search query category predictions to content category predictions

It’s probably not surprising that (4) is what we implemented. Options (1) and (2) violate our requirements. These are too top heavy and manual at content setup time, and would additionally involve heavy weight data structures to be indexed alongside our content in our searchable content store. Option (3) is a bit too rigid and deterministic when it comes to multi-term or multi-phrase search queries. Continuous ingestion of a large synonyms file would be an additional challenge as our searchable content store is continuously updating.

Option (4) has additional advantages in that a categorization model can learn which keywords in a search query to focus on in its effort to categorize (effectively ignoring contextually inconsequential tokens), where a text matching solution would need some top down technique to sort relevant content based on the various terms and phrases that could be contained in a search query. While the examples “bread” and “zinfandel” are simple, users can type anything they want into the search bar. We need to infer their categorical intent regardless of whether or not their query is something simple that we’ve seen before.

So bringing the conversation back to machine learning, this is where we land. We have a fairly straightforward task: categorize search queries in real-time.

BlazingText

So what kind of model or framework do we use? In 2019, the sky’s the limit for open source options for doing something like this. We have some tried and true techniques for solving this kind of problem, but we’d been eager to try out some of Amazon SageMaker’s functionality since we work mostly with AWS infrastructure and had been curious.

SageMaker is AWS’ framework agnostic model training and deployment service. In their own words, “Amazon SageMaker provides every developer and data scientist with the ability to build, train, and deploy machine learning models quickly.” It was launched late in 2017 and we began experimenting a few months later. This lead us to BlazingText.

Cutting to the chase here, BlazingText is very cool. If you’ve spent some time working with word2vec, doc2vec, fastText, or others and you work in an AWS friendly environment, check it out. This suite of models learn vector representations of words based on their linguistic context in the training data. These word vector representations can be used for pre-processing in a supervised text document classifier.

AWS first released BlazingText in early 2018. It basically scales Facebook’s fastText on specialized distributed infrastructure, provisioned on-demand. Since its first release, AWS released additional features to BlazingText in July of 2018 which really set it up to become an ideal candidate for our search relevancy use case:

  1. Supervised mode — an additional multi-class or multi-label classifier step after the embeddings vectors are generated, turning BlazingText into a generalized text classification algorithm, boasting 100x training performance improvements over deep learning alternatives achieving similar accuracy
  2. Real-time predictor — support for SageMaker’s real-time predictor API functionality which hosts a trained BlazingText model as a fully managed, auto-scaling, monitored service as a REST API endpoint for low latency, high throughput prediction as a service

Here is the basic architecture for supervised mode from SageMaker’s docs:

Supervised Mode for BlazingText

This enabled Ibotta’s machine learning team to train a large scale multi-class classifier leveraging swaths of training data from internal product dictionaries and historical search data to produce a real-time machine learning API. This model leverages word embeddings learned from the semantic relationships between words in our product descriptions, as well as our internal categorization hierarchy used to categorize content in the app.

It really wasn’t that hard. With BlazingText, the exercise simply becomes properly obtaining, cleaning and aggregating training data. We’ve spent a lot of time building classifiers to categorize text documents at Ibotta using various frameworks and infrastructure. This was one of the easier, cheaper and more performant techniques we’ve found to train a multi-class model with a high number of classes with a large amount of training data.

Not to mention the bayesian hyperparameter optimization that comes out of the box with SageMaker. We use this to parameterize the BlazingText model, particularly looking for the optimal word embedding dimension size. SageMaker spins up necessary resources on-demand to train and score various hyperparameter sets in parallel. While our models were looking good with sane defaults, this off-of-the-shelf routine for optimizing hyperparameters ensures we found the best model for our solution.

We get to leverage as much of our data as we want to, and quickly train and optimize a high quality model that meets our needs. This is all with just a couple of API calls pointing to our training data stored in S3.

Custom Machine Learning APIs

For a very applied team that spends a lot of time on integrating models and predictions into our larger engineering infrastructure, simple integration points are really exciting to us. As it turns out, one way to integrate machine learning with microservices inside of a microservices infrastructure is with another microservice.

Using SageMaker we were able to create a REST endpoint available for use by any other microservice in our infrastructure. Setting up an endpoint to serve predictions is one simple API call to SageMaker after model training and validation (you can also do this in the console with all clicks and no code).

If you’ve spent a lot of time with large scale batch predictions where your end product is to ship data (as we have), it’s really fun to do something different and ship a machine learning prediction API. Especially with a consumer facing app, you get people pressing buttons on their smartphones all over the country triggering calls to your service. They are then seeing results as a function of the prediction your model makes at that instant.

You can watch it in real-time as well. SageMaker provides monitoring through cloud watch in the AWS console:

Prediction Latency for Inference Pipeline

Just one example of some of the monitoring charts you can build, this is average latency of the BlazingText endpoint over the last hour as of time of this writing. This includes the BlazingText prediction, as well as custom pre and post processing steps (written in python, using inference pipelines) we tack on for a more custom experience. These inference pipelines allow us to build in text pre-processing like lowercasing, special character removal, and stemming at prediction time. Additionally, we apply post-processing which consists of probability score thresholding and parent category lookup from an in-memory dictionary.

It’s also a fully managed service so we don’t need to worry much about uptime and resiliency, though we do build in failure modes for the services calling SageMaker in case it does go down.

Conclusion

The machine learning world abounds with new tools, models and frameworks. While a year is a long time, much of SageMaker feels very new, which comes with its own challenges for getting up to speed (there isn’t much on stack overflow). Starting out on day one, it’s definitely easier to train a regression your laptop than it is to train a BlazingText model on SageMaker. But ultimately the scalability and access to cutting edge models are worth it.

On the machine learning team at Ibotta we use what we can stand up and integrate quickly, we use what is fun to build, and we use what works. At the end of the day, using state-of-the-art machine learning frameworks to power the search experience for millions of users on their smartphones is a pretty fun thing to be a part of.

If these kinds of projects and challenges sound interesting to you, Ibotta is hiring! Check out our jobs page for more information. Take a look at some of our other work on the machine learning team here, here and here.

--

--

Evan Harris
Building Ibotta

Building ML applications focused on retrieval and LLMs.