Feature Engineering at Ibotta

Published in

Building Ibotta

6 min readApr 30, 2018

Predicting the Availability of Products

Feature Engineering at Ibotta is part of the larger Data Science team. This is our mission:

“To enhance our underlying data asset by leveraging scalable machine learning models to extract valuable information from raw data sources.”

We build and maintain our own production processes, which typically deliver updated data features to stakeholders on a daily basis. The work associated with building and maintaining our production tables involves training statistical models, applying model predictions to billions of data points each week, leveraging Spark to scale these jobs, and utilizing Apache Airflow to coordinate the daily execution of these jobs.

Some examples of these features and uses are:

Predicted behavioral shopper segments — used by Marketing to personalize messaging to our user base.
Product metadata generating framework — used by Analytics and Data Science for more detail into what Ibotta users are buying.
Brand propensity scores — used by Recommendation Systems to recommend relevant content to users in the app.

The remainder of this piece will focus on the finer details of one Feature Engineering project in particular; predicting the availability of products at stores.

The Ibotta mobile app presents to users a variety of products at a range of retailers; we call these “offers” within a “retailer module”, as seen in the screenshot below.

Within the Ibotta app there exist some offers presented within retailer modules where the product advertised isn’t sold at that retailer store, or at that retailer at all; one example is the “Mobile 1 Motor Oil” offer that was presented within the Whole Foods module, the other classic example is alcohol offers presented at Colorado grocery stores that don’t sell alcohol.

This problem is inevitable given the way in which offers are set up and associated with retailer categories, but the current state is unacceptable because these incompatible offer/retailer combinations are filling up valuable gallery space in the app and misleading customers resulting in a poor user experience. While likely unsolvable, like many problems the FE team addresses, we had some ideas for methods to mitigate the issue.

The plan is to treat this problem with a supervised learning solution, meaning we build a dataset to train a model to predict the likelihood a given product is available at a particular store. In addition to product and store metadata and other engineered features likely indicative of availability status, the training dataset requires target labels (“available”/”unavailable”) representing truly (or likely truly) available and unavailable product/store combinations. We have many instances of products being purchased across many stores (the positive availability class), but it’s trickier to find certain instances of the negative class (unavailable) because a lack of purchase history of a product doesn’t necessarily imply unavailability.

It’s challenging enough to build a model to predict the availability of a product at a particular retailer. However, we want to personalize content down to the store level and we want the model to predict the availability of a product at any given store. This is more challenging for a couple reasons;

There are a hundred thousand stores and hundreds of thousands of products which makes for a lot of predictions, and
The landscape of product/store purchases is sparse across all product/store combinations.

The data sources used for this project are clean_txns and customer_offer_not_founds, both tables in the data lake.

clean_txns: a cleaned record of all submitted receipt item purchases on our customers with added metadata such as product brand and categories, store zip code and dma, etc.
customer_offer_not_founds: a collection of all products that users have reported as being unavailable at particular stores.

The latter is a potentially useful source for identifying more-likely-unavailable product/store pairs, but the data is sparse and the presence/absence of a product in the table should not absolutely determine unavailability/availability. The data in clean_txns is also sparse at the product/store level, therefore we’ll need to engineer some “features” to give the model more information to work with.

“feature”: an individual measurable property/characteristic of a phenomenon being observed.
“feature engineering”: the process of creating features which enable machine learning algorithms to work in practice (hence our team name).

For the model that predicts the availability of products at stores, examples of simple features are category_1 of the product (e.g., the category_1 for a gallon of milk would be “dairy”), brand of the product, zipcode of the store, retailer of the store, etc. An example of an engineered feature is a score representing the likelihood the brand of a product is available at a particular retailer.

In order to get engineered features such as the aforementioned brand/retailer score we need to train a model to predict the likelihood a brand is available at a retailer. This is an easier model to train because rolling transaction data up to the brand/retailer level (as opposed to the most granular product/store level) is not nearly as sparse, yielding a more robust training set.

We built twenty-five of these models, each trained to predict the availability of a “thing” at a “place”, where “things” are product-level attributes (global product, brand, category1, category2, category3) and “places” are store-level attributes (store, zip code, DMA, retailer, DMA/retailer nested). We store the availability predictions for all possible “thing”/”place” combinations for each of the twenty-five models in tables which will be referenced when building out the full feature space for the final product/store model.

Below is an example of what the brand/retailer score table might look like:

When we apply the final product/store model to predict the likelihood “Ben & Jerry’s Vanilla Ice Cream, 16 oz” (product) is available at “King Soopers 1950 Chestnut Pl Denver CO” (store), the brand/retailer score feature would be 0.9486 according to the example above.

Below is a list containing other input features when predicting availability of the above example:

Observed # purchases
Days since the last purchase
Observed # offer-not-found reports
Days since the last offer-not-found report
% of observations that are purchases (as opposed to not-founds)
(Days since last purchase) — (days since last not-found)
Prob.(frozen foods (cat1) available in Denver, CO (DMA))
Prob.(frozen novelties (cat2) available in 80202 (zip))
Prob.(Ben & Jerry’s (brand) available at King Soopers (retailer))
…

This stacked ensemble of models (the method of building a model that uses features generated by other models) provides additional signal for the final model we build. Since we don’t have dense enough data at the product/store level to build a good training set to train a product/store level model, engineering the additional score features at higher levels (e.g., brand, category, DMA, retailer, etc.) provides a richer feature set and therefore more signal for the product/store model to interpret.

The following diagram depicts the design and interaction of models in the stacked ensemble.

The next step is testing this availability feature in the app, which involves removing offers that are likely unavailable at particular retailer locations. It’s difficult to evaluate the performance of this model across the full product/store spectrum because there is no source of truth for what is available at any given time. So testing/evaluating various thresholds of availability predictions will allow us to optimize the set of offers we determine to be available at a given retailer in a given region.

Relating the execution of this project back to the FE team mission, it can be seen that we’re enhancing the underlying data asset by building a feature that enables accurate in-app offer filtering, which we accomplish by developing a framework to train and apply machine learning models that extract valuable information from data sources in the data lake.

Feature Engineering at Ibotta

Written by James Foley