Serving Cold-Start Retailer Recommendations in Real-Time

Serving Real-Time Recommendations for New Users to Improve Relevance and Discoverability with Amazon SageMaker

Published in

Building Ibotta

6 min readOct 21, 2020

On the Ibotta machine learning team, one of our core goals is to present users with the content that is most relevant to them at any given time. If we don’t show users items they want to buy, they’re unlikely to continue using our product. In improving content relevancy, we encounter many exciting challenges, some of which include:

We don’t know much, or anything at all, about the shopping preferences of our new users (a.k.a. the user cold-start problem).
User preferences shift over time, even within a single day. For instance, many users tend to buy coffee in the morning and groceries in the evening.
Users expect the content to be available when they want it. That means our recommendations need to be reliable and fast, as well as relevant, for our millions of users.

In this post, I’ll discuss how we built a recommender system that handles new users and dynamic preferences for Pay with Ibotta, a product where you pay through our app and earn instant cashback on your entire purchase amount. I’ll also share how we deployed this model using Amazon SageMaker to host a real-time inference endpoint — see the Deployment section if you’re interested in skipping ahead.

Our Approach

The Cold-Start Recommender Problem

The term derives from cars. When it’s really cold, the engine has problems with starting up, but once it reaches its optimal operating temperature, it will run smoothly. With recommendation engines, the “cold start” simply means that the circumstances are not yet optimal for the engine to provide the best possible results. (Huba Gaspar — The Cold Start Problem for Recommender Systems)

Providing new users with relevant recommendations is critical for delivering value and converting newcomers into loyal users. Quality recommendations allow for an immediate connection with the user, quickly demonstrating how our service can help them save money on their everyday purchases. How do you do this when you don’t know anything about them?

The most common approach is to use global popularity. In our case, that would mean simply ranking retailers by the total number of transactions across all users over the past x days. However, such a naive approach ignores the small amount of contextual information available. A better approach is to use what context we have to make the best possible recommendations.

The Data

Two of the most prominent contextual features we have available are the current time and the user’s approximate location.

Many of our Pay with Ibotta retail partners offer “buy now, consume now” products, e.g., coffee shops and restaurants, making time-based features particularly important. This makes intuitive sense, as different types of retailers peak at various times throughout the day and on different days throughout the week. For example, coffee shops tend to peak in the morning, whereas restaurants peak around mealtimes and home improvement stores peak on weekend afternoons.

Location information also provides valuable context. For starters, users are much more likely to visit retailers that are close by. Also, location features interact nicely with time features, enriching our model. For instance, traffic at a fast-casual restaurant chain is likely to spike at business district locations during the week and at neighborhood locations on weekends. Using location features also allows us to enhance our time-based features by converting from UTC to local time.

Time data can be represented with simple features like the hour, day of the week, and the month, or use more advanced encoding methods to represent its cyclical nature. Location data can also be represented in multiple ways, including geo-coordinates, raw zip codes, or even encoded geohashes.

With the data provided, we can serve cold-start recommendations that are essentially popularity scores given the time and place — a significant improvement compared to global popularity.

The Model

We treated this problem as a multiclass classification problem. The features are the time and location of users during sessions that result in purchases. The target is the retailer where the purchase took place.

For the classifier, I used XGBoost. The model was tuned using SageMaker’s hyperparameter tuning, which supports Bayesian search. If you haven’t used Bayesian search for hyperparameter tuning before, I highly recommend it — it can result in a significant lift in offline model evaluation metrics in many cases!

Deployment

Model deployment was an often ignored topic during the explosion in data science hype during the early and mid-2010s (which I was certainly a part of). However, undeployed models don’t add value to businesses. Accordingly, model deployment has become a much hotter topic in the past few years.

Fully managed machine learning solutions, like Amazon SageMaker, have made deploying models much more straightforward. SageMaker allows you to train and deploy models, with most of the configuration abstracted away.

Production Deployment at Ibotta

The machine learning team at Ibotta is very applied. The team consists of machine learning engineers and software engineers focused on building and deploying models to enhance the user experience and drive key business metrics.

The team has historically delivered machine learning products via batch computation, making predicted values accessible in a low-latency datastore. More recently, the team has shifted to delivering most products as real-time services that sit behind an API. This has several benefits, including:

A single integration point for internal users of machine learning products.
Adding the infrastructure needed to test two or more competing models simultaneously.
Cost savings from less reliance on expensive Spark jobs.
The ability to incorporate up-to-the-minute data in model predictions, making the app more personal for new users as soon as they start interacting with it.

The Service

We built this service using SageMaker Custom Containers (read more about how we use these here). Custom containers give us the ability to implement custom logic that includes data retrieval, preprocessing, prediction, and post-processing — all executed at runtime.

This particular service consists of three separate models. Based on the location data included in the request, the service selects the appropriate model and preprocessing steps.

If a user’s geo-coordinates are available, preprocessing steps include using a single decision-tree classifier to estimate the user’s timezone, and convert time features to local time. A single tree can learn timezone borders remarkably well (greater than 99.5% accuracy) and is extremely fast at prediction time (less than one millisecond).

If no geo-coordinates are provided, we utilize the user’s registration zip code to look up the timezone and average geo-coordinates of the zip code from a dictionary held in memory.

Deploying models as a service is not without its challenges. Our services need to be fast, scalable, and reliable in addition to providing quality outputs.

Fortunately, making the service reliable and scalable is simple with SageMaker. Most of the work here comes down to choosing the right instance size and configuring auto-scaling. Tools like Artillery help you load test your services to ensure they can handle the traffic you’ll be sending them.

For this particular service, latency was not a huge issue since the model uses few features, making it fast out of the box. Most of our machine learning services have much more data available and use many more features. For those services, latency constraints and offline validation metrics are often at odds with each other. However, good enough predictions served with low latency may beat slightly better predictions served with high latency when looking at online metrics.

Some simple strategies for reducing latency include using fewer features and constraining various model hyperparameters (e.g., the number of trees and max depth for tree-based models). It’s also important to follow best practices for writing efficient code to squeeze every millisecond out of your service. Writing modular code also allows you to utilize tracing tools to measure how long each part of your code takes to run.

Using a real-time prediction paradigm, as opposed to batch precompute and lookup, we’re set up well to deliver cutting-edge machine learning products. We can build systems that use data as soon as it becomes available, e.g., in-session recommenders.

We can also develop systems that decide which recommenders to use throughout a user’s journey. For instance, as we learn more about a particular user, we can start to incorporate recommendations from a more personalized system. How heavily should we weight this session? What about previous sessions? Do we know enough about the user to move away entirely from the time-location model? All of these questions are automatically answered using contextual bandits.

We’re Hiring

Ibotta is hiring, so if you’re interested in working on challenging problems like the one described in this article then give us a shout. Find Ibotta’s career page here.