Ranking Looks at Nordstrom using Machine Learning
As a member of our Nordstrom Customer & Product Data Science team, I get to help build algorithms that drive the shopper experience on Nordstrom.com. For the past two years we’ve been investing heavily in Nordstrom Looks, building an outfitting experience, based on shopper preferences, that we hope inspires and enables customers to discover new products. As more Looks are created, it has become more important to surface Looks that resonate with customers.
Currently, the main way that customers view Looks is on a product page. It might be that the product they are viewing has ten or more Looks. We want to show customers the Look they are most likely to take inspiration from and engage with to give them the best experience we can. For our Looks ranking model we took inspiration from Content-Based Recommender systems, where we rank based on Looks that are similar to previous Looks users have engaged with.
Engagements currently are defined as FETCH
, SWIPE
, SELECT
, SHUFFLE
, and ATB
(add-to-bag). Our model biases towards more recent engagements, and what we’ve assumed to be stronger engagements (i.e. SELECT
and ATB
).
Methodology
The basis for our ranking algorithm is a distance calculation between a shopper’s Look preferences and other Looks in an embedded space we’re calling “Look-space.” At runtime, we determine where in “Look-space” a user is based on their previous interactions with Looks, and then find the nearest Looks to them using Manhattan distance. The Look-space embedding is based on a series of engineered features and unsupervised clustering models run on various feature sets that provide a quantitative value for a Look in some dimension: colors, brands, product types, and text. Additional dimensions for gender, age, price, looktype (e.g. apparel
), and activity (e.g. entertaining
) are included in the embedding as well. In the end, Look-space ends up as a high-dimensional blend of both continuous and categorical features, which is the reason Manhattan distance is used rather than Euclidean.
As a very simple example, let’s say the only feature we want to use to rank Looks are the titles from each Look, and we only want to allow there to be three different types (clusters) of Look titles (k=3
). For sake of argument, let’s say that Look_A has the embedded vector [1 0 0]
after clustering on Look title while Look_B’s embedding is [0 1 0]
, where each digit refers to whether or not a Look belonged to a specific cluster. We now have a 3-dimensional space in, which to describe Looks. At runtime, a record of user data is analyzed, which determines where in the Look-space the user is. In this case, let’s say that the user has engaged far more with Looks whose embedding is [0 1 0]
. A distance metric between where the user is and the qualified Looks is then calculated and used to rank these Looks. For this particular user, we would assume they would be more partial towards Look_B since it is most closely related to Looks that they have previously engaged with.
A diagram for the end-to-end process is presented below. Look and product data are fed into a python model kicked off by a scheduled job, running with Jenkins. The Look-space is derived for all published, non-deleted Looks. Then the data and ranker.py
logic are packaged together and pushed to a data store for serving.
Training Details
Training Steps
1. Pull Looks data
2. Merge Looks with product data
3. Engineer Look-level features, embedding each Look in Look-space
4. Train a NearestNeighbors
model
5. Publish model as as a python package and push to storage
The model is retrained every four hours from 5am to 5pm in order to be able to rank newly-published Looks.
Step 1 - Pull Looks Data
Looks data containing the Look title, Look description, looktype, and activity tags associated with the Looks are pulled into the trainRanker
class.
Every Look that has been published and has not been deleted is included in the Looks data. As you can see, we have to deal with quite a few null values in the data. In cases where values are null, they are imputed as 0s.
Step 2 - Merge With Product Data
The slots contained in each Look are joined with product data on style_id
and colorcode
giving us the features of the items in each Look.
Step 3 - Look-level Feature Engineering
Color, Product Type, Brand and Text Clusters
Let’s say we’d like to describe the colors in a Look. How would we do that? Remember, the final output feature must be a single value that represents the colors in a Look.
There are a near-infinite number of color palettes and color combinations that could be used in Looks, so the question becomes how can we systematically reduce the number of possible color combinations to a reasonable number while at the same time automatically finding out what those color combinations are. Our Look ranking model uses an unsupervised MiniBatchKMeans clustering method to describe these features. In this way, we are able to choose how many color combination clusters we’d like all the looks to be divided into. An added benefit is that as the model is retrained, it learns new color combinations as they appear.
During training, we construct what we are calling color sentences for each Look. The predominant color of each product is extracted using Nordstrom’s internal color extraction API. These color names are then concatenated into a single sentence representing the Look. The example above becomes Black Black Black DarkGrey DarkGrey DarkGrey
. We do this for every single Look.
After building our color sentences we run a TF-IDF fit_transform()
on the corpus to build a sparse matrix of Looks and their weighted color vectors. MiniBatchKMeans.fit_predict()
clustering at k=5, 10, and 20 is then run on the sparse matrix to classify each Look into clusters. This unsupervised method will determine the class label of the Look’s colors at the three different granularities of 5, 10, and 20.
Here we show an example of clustering 10,000 Looks into 5 different color clusters and then embedding those in 2D space for visualization.
The following images show random Looks from inside two different color clusters:
As you can see, the unsupervised clustering on predominant colors results in clusters of Looks that have similar patterns. You might end up with Looks that are mostly black or Looks that tend to have lots of beige. You could have color clusters that are mostly reds and yellows. These clusters will also change with new data.
Product type, text, and brand features are all derived using the same method of building unsupervised clusters at k=5, 10, and 20.
Look-level Price Features
We take the price of each item in a Look and derive the following metrics. Total Price
we calculate just by adding everything up. Price range
is calculated by taking the highest priced item minus the lowest price. Does a Look have an item that makes up 50% or 75% of the Look’s total cost (hasItem?50%
, hasItem?75%
)? The thinking being that this will help capture signal about Looks that are “high-low”. Finally, three features are created that determine the count of items where each item makes up 20, 30 or 40 percent of the Look (count>20%
, count>30%
, count>40%
).
Look-level Gender Features
For gender features, we simply calculate the proportion of items within a Look that are Male
, Female
, or Unisex
.
Look Type and Look Activity Features
For these two feature sets we dummify the available looktypes and activity tags, converting the categorical data into binary data. If the type of tag is present, the value is a 1, otherwise the value in the vector will be 0.
Step 4 - Train a NearestNeighbors Model
Once all the Look-level features are calculated, they are all concatenated together so that each Look has a vector representing the full multi-modal embedding. An sklearn.neighbors.NearestNeighbors
model is trained off of this embedding.
Step 5 - Package the Model and Push to Storage
Four different files are then uploaded to a datastore where they can be picked up for ranking:
1. data.csv
is the matrix of numerical Look features for each Look. This file is required to determine where in Look-space the shopper is. It is also used for the rank()
method, since the NearestNeighbor
model won’t let you actually specify which look_ids
we want to rank.
2. metadata.csv
contains basic metadata for each Look — such as title, description, and looktype — and it is mainly used for debugging and QA.
3. nn.pkl
is the pre-trained NearestNeighbors
model, and is used for the top_n()
recommendation method.
4. rankingv2.tar.gz
is the tarred and g-zipped python package that will serve the results. We output this file every time the model is trained so we always have the most up-to-date model serving code being packaged alongside the data.
Runtime: Rank and Recommend
No matter if you are trying to rank Looks on a product page where only a few Looks are qualified, or trying to determine the top Looks for a customer out of the thousands of live Looks, you must first start by determining where in Look-space a shopper is currently located. The only thing we require from a shopper is a log of their recent Look interactions. All the rest of the data needed for ranking is stored inside the python model.
The user’s Look log data is fed into the initialized Ranker
object and the position of the user in Look-space is extrapolated using this decay_function()
:
After joining the Look-space data with the Look log data, the decay function will determine the weight that each interaction should have on the final vector representation of user’s location in Look space. The weighting factors are found by plugging the time-delta between now and each interaction into an exponential decay function. The weighting factors are then altered again by multiplying them by our predefined weights for each interaction type, and then finally scaling all factors by dividing by the max factor. See the example below.
Once the factors are found, they are multiplied by the Look-space data for their corresponding look_id
s. As you can see, the stronger or more recent the interaction, the more the Look that particular Looks features will remain unchanged (i.e. multiplied by 1). The final look_space_location
for the user is found by summing all interaction vectors row-wise.
From here, the two methods, rank()
and top_n()
diverge. For the rank()
method, a pairwise distance metric is calculated between the user’s Look space location and each of the qualified_looks
we are trying to rank.
To calculate the top Looks for a customer, the Look-space location is passed to our pre-trained NearestNeighbors
model to find the top and closest Looks to what the user is currently interested in.
The image below shows the top_n
Looks (bottom row) that would be suggested to a user if they had previously interacted with the Looks shown in the top row. Notice how they are all quite similar to one another.
Knobs and Levers: Have it your way!
Built into our Look ranking model is the ability to tweak the feature sets and interactions that we hypothesize result in Looks that fall more in line with a user’s tastes. These are parameters that we can change and A/B test to fine-tune performance:
These parameters are fed into the Ranker
upon initialization. You tell the ranker that you want to emphasize color
features or age
. The activity weights are applied during the decay_function()
calculation, allowing you to emphasize or de-emphasize certain interaction types.
Conclusion
Our Look ranking model uses unsupervised learning to derive Look-level features from both Look and product data. We use the features of each Look to define a convenient embedding in Look space that we can use to calculate Look similarity and Look distances. The methods that we’re using are highly scalable and performant. Since rolling out Look Ranking V2 we’ve seen increased customer engagement as our shoppers find great new products that fit their style.