Ranking Looks at Nordstrom using Machine Learning

Aaron Lichtner
10 min readMar 26, 2019


As a member of our Nordstrom Customer & Product Data Science team, I get to help build algorithms that drive the shopper experience on Nordstrom.com. For the past two years we’ve been investing heavily in Nordstrom Looks, building an outfitting experience, based on shopper preferences, that we hope inspires and enables customers to discover new products. As more Looks are created, it has become more important to surface Looks that resonate with customers.

Currently, the main way that customers view Looks is on a product page. It might be that the product they are viewing has ten or more Looks. We want to show customers the Look they are most likely to take inspiration from and engage with to give them the best experience we can. For our Looks ranking model we took inspiration from Content-Based Recommender systems, where we rank based on Looks that are similar to previous Looks users have engaged with.

Engagements currently are defined as FETCH, SWIPE, SELECT, SHUFFLE, and ATB (add-to-bag). Our model biases towards more recent engagements, and what we’ve assumed to be stronger engagements (i.e. SELECT and ATB).


The basis for our ranking algorithm is a distance calculation between a shopper’s Look preferences and other Looks in an embedded space we’re calling “Look-space.” At runtime, we determine where in “Look-space” a user is based on their previous interactions with Looks, and then find the nearest Looks to them using Manhattan distance. The Look-space embedding is based on a series of engineered features and unsupervised clustering models run on various feature sets that provide a quantitative value for a Look in some dimension: colors, brands, product types, and text. Additional dimensions for gender, age, price, looktype (e.g. apparel), and activity (e.g. entertaining) are included in the embedding as well. In the end, Look-space ends up as a high-dimensional blend of both continuous and categorical features, which is the reason Manhattan distance is used rather than Euclidean.

As a very simple example, let’s say the only feature we want to use to rank Looks are the titles from each Look, and we only want to allow there to be three different types (clusters) of Look titles (k=3). For sake of argument, let’s say that Look_A has the embedded vector [1 0 0] after clustering on Look title while Look_B’s embedding is [0 1 0], where each digit refers to whether or not a Look belonged to a specific cluster. We now have a 3-dimensional space in, which to describe Looks. At runtime, a record of user data is analyzed, which determines where in the Look-space the user is. In this case, let’s say that the user has engaged far more with Looks whose embedding is [0 1 0]. A distance metric between where the user is and the qualified Looks is then calculated and used to rank these Looks. For this particular user, we would assume they would be more partial towards Look_B since it is most closely related to Looks that they have previously engaged with.

A diagram for the end-to-end process is presented below. Look and product data are fed into a python model kicked off by a scheduled job, running with Jenkins. The Look-space is derived for all published, non-deleted Looks. Then the data and ranker.py logic are packaged together and pushed to a data store for serving.

Process flow for ranking Looks.

Training Details

Training Steps
1. Pull Looks data
2. Merge Looks with product data
3. Engineer Look-level features, embedding each Look in Look-space
4. Train a NearestNeighbors model
5. Publish model as as a python package and push to storage

The model is retrained every four hours from 5am to 5pm in order to be able to rank newly-published Looks.

Step 1 - Pull Looks Data

Looks data containing the Look title, Look description, looktype, and activity tags associated with the Looks are pulled into the trainRanker class.

Sample Looks to be ranked.

Every Look that has been published and has not been deleted is included in the Looks data. As you can see, we have to deal with quite a few null values in the data. In cases where values are null, they are imputed as 0s.

Step 2 - Merge With Product Data

The slots contained in each Look are joined with product data on style_id and colorcode giving us the features of the items in each Look.

Example product data used to derive look space features.

Step 3 - Look-level Feature Engineering

Color, Product Type, Brand and Text Clusters

Let’s say we’d like to describe the colors in a Look. How would we do that? Remember, the final output feature must be a single value that represents the colors in a Look.

Look 10007

There are a near-infinite number of color palettes and color combinations that could be used in Looks, so the question becomes how can we systematically reduce the number of possible color combinations to a reasonable number while at the same time automatically finding out what those color combinations are. Our Look ranking model uses an unsupervised MiniBatchKMeans clustering method to describe these features. In this way, we are able to choose how many color combination clusters we’d like all the looks to be divided into. An added benefit is that as the model is retrained, it learns new color combinations as they appear.

During training, we construct what we are calling color sentences for each Look. The predominant color of each product is extracted using Nordstrom’s internal color extraction API. These color names are then concatenated into a single sentence representing the Look. The example above becomes Black Black Black DarkGrey DarkGrey DarkGrey. We do this for every single Look.

Example Look color sentences.

After building our color sentences we run a TF-IDF fit_transform() on the corpus to build a sparse matrix of Looks and their weighted color vectors. MiniBatchKMeans.fit_predict() clustering at k=5, 10, and 20 is then run on the sparse matrix to classify each Look into clusters. This unsupervised method will determine the class label of the Look’s colors at the three different granularities of 5, 10, and 20.

Here we show an example of clustering 10,000 Looks into 5 different color clusters and then embedding those in 2D space for visualization.

2D visualization of color clusters found in Looks when k=5. Notice that we can see each cluster seems to contain multiple subclusters. Visualization done using t-SNE.

The following images show random Looks from inside two different color clusters:

Color cluster 10_8: Note the algorithm picked up on reds with neutrals
Color cluster 20_15: Here the algorithm picked up on pinks with neutrals

As you can see, the unsupervised clustering on predominant colors results in clusters of Looks that have similar patterns. You might end up with Looks that are mostly black or Looks that tend to have lots of beige. You could have color clusters that are mostly reds and yellows. These clusters will also change with new data.

Product type, text, and brand features are all derived using the same method of building unsupervised clusters at k=5, 10, and 20.

Look-level Price Features

We take the price of each item in a Look and derive the following metrics. Total Price we calculate just by adding everything up. Price range is calculated by taking the highest priced item minus the lowest price. Does a Look have an item that makes up 50% or 75% of the Look’s total cost (hasItem?50%, hasItem?75%)? The thinking being that this will help capture signal about Looks that are “high-low”. Finally, three features are created that determine the count of items where each item makes up 20, 30 or 40 percent of the Look (count>20%, count>30%, count>40%).

Look-level Gender Features

For gender features, we simply calculate the proportion of items within a Look that are Male, Female, or Unisex.

Look Type and Look Activity Features

For these two feature sets we dummify the available looktypes and activity tags, converting the categorical data into binary data. If the type of tag is present, the value is a 1, otherwise the value in the vector will be 0.

Step 4 - Train a NearestNeighbors Model

Once all the Look-level features are calculated, they are all concatenated together so that each Look has a vector representing the full multi-modal embedding. An sklearn.neighbors.NearestNeighbors model is trained off of this embedding.

Step 5 - Package the Model and Push to Storage

Four different files are then uploaded to a datastore where they can be picked up for ranking:

1. data.csv is the matrix of numerical Look features for each Look. This file is required to determine where in Look-space the shopper is. It is also used for the rank() method, since the NearestNeighbor model won’t let you actually specify which look_ids we want to rank.
2. metadata.csv contains basic metadata for each Look — such as title, description, and looktype — and it is mainly used for debugging and QA.
3. nn.pkl is the pre-trained NearestNeighbors model, and is used for the top_n() recommendation method.
4. rankingv2.tar.gz is the tarred and g-zipped python package that will serve the results. We output this file every time the model is trained so we always have the most up-to-date model serving code being packaged alongside the data.

Runtime: Rank and Recommend

No matter if you are trying to rank Looks on a product page where only a few Looks are qualified, or trying to determine the top Looks for a customer out of the thousands of live Looks, you must first start by determining where in Look-space a shopper is currently located. The only thing we require from a shopper is a log of their recent Look interactions. All the rest of the data needed for ranking is stored inside the python model.

Example log of user-Look interaction data. This is the input to our Look ranking and recommendation model.

The user’s Look log data is fed into the initialized Ranker object and the position of the user in Look-space is extrapolated using this decay_function():

After joining the Look-space data with the Look log data, the decay function will determine the weight that each interaction should have on the final vector representation of user’s location in Look space. The weighting factors are found by plugging the time-delta between now and each interaction into an exponential decay function. The weighting factors are then altered again by multiplying them by our predefined weights for each interaction type, and then finally scaling all factors by dividing by the max factor. See the example below.

Depiction of how each interaction gets weighted for activity type and time decay.

Once the factors are found, they are multiplied by the Look-space data for their corresponding look_ids. As you can see, the stronger or more recent the interaction, the more the Look that particular Looks features will remain unchanged (i.e. multiplied by 1). The final look_space_location for the user is found by summing all interaction vectors row-wise.

From here, the two methods, rank() and top_n() diverge. For the rank() method, a pairwise distance metric is calculated between the user’s Look space location and each of the qualified_looks we are trying to rank.

To calculate the top Looks for a customer, the Look-space location is passed to our pre-trained NearestNeighbors model to find the top and closest Looks to what the user is currently interested in.

The image below shows the top_n Looks (bottom row) that would be suggested to a user if they had previously interacted with the Looks shown in the top row. Notice how they are all quite similar to one another.

Knobs and Levers: Have it your way!

Built into our Look ranking model is the ability to tweak the feature sets and interactions that we hypothesize result in Looks that fall more in line with a user’s tastes. These are parameters that we can change and A/B test to fine-tune performance:

These parameters are fed into the Ranker upon initialization. You tell the ranker that you want to emphasize color features or age. The activity weights are applied during the decay_function() calculation, allowing you to emphasize or de-emphasize certain interaction types.


Our Look ranking model uses unsupervised learning to derive Look-level features from both Look and product data. We use the features of each Look to define a convenient embedding in Look space that we can use to calculate Look similarity and Look distances. The methods that we’re using are highly scalable and performant. Since rolling out Look Ranking V2 we’ve seen increased customer engagement as our shoppers find great new products that fit their style.



Aaron Lichtner

I’m an LA-born Seattle transplant with a love for materials engineering and data science.