Going from 0 to 1 modeling User Preferences for Personalized Recommendations

Emmanuel Fuentes
Whatnot Engineering
9 min readMar 30, 2022

Whatnot is a live shopping platform for users to connect around communities to buy and sell products from Funko Pop! to trading cards, sneakers and much more! Part of what makes Whatnot so exciting is the entertaining ephemeral nature of the product experience. Live streaming shows on Whatnot are scheduled ahead of time and last anywhere from 20 minutes to 2 hours and in the extreme case multiple days! Helping users find the most relevant shows and hosts to enjoy is the Whatnot team’s never-ending goal.

“Recommended Shows” Home Feed Carousel

Part of the complexity in helping users find the most relevant content is the diverse kinds of users, communities, products, and shows on the platform. Just in terms of shows, some users are looking to be entertained, some are looking for great deals, others are looking for specific products, while some rally around particular people. Once a user finds a relevant show, they can easily follow that host account to get notifications and dedicated updates when that host is going live again or bookmark individual shows. We want to guide our users to a host they’ll enjoy as quickly as possible so they can spend more time engaging with the community and less time navigating and potentially missing great content. To do so we are building personalized recommendations into the app experience to get users there in as little trial and error as possible.

Whatnot Live Shopping Experience

Livestream Host Recommendation Model

One simple yet effective way to recommend shows we think users will like is to look at their engagement history with each live host, find other users who have similar histories, and then look for the overlapping and non-overlapping patterns. When we have a set of live shows to present to the user, they can be ordered based on these learned patterns. This is commonly referred to as a form of collaborative filtering. In this scenario, we leverage the user to live stream host engagement interactions to quantify the relationship. These interactions are implicit between the user and live host, such as views, comments, bids, purchases, etc. This is in contrast to explicit interactions such as ratings. We use these implicit signals to model the relationships between users and sellers and leverage the learned representations for downstream recommendations tasks.

We built our initial livestream recommendations as a latent factor model using user and host implicit engagement data. These user and host representations are vectors where each element is a latent factor encapsulating some intrinsic property. This means a user and host vector is defined as the aggregation of factors that uniquely represent its relationship with the other. Pretty cool! These special kinds of vectors are also referred to as embeddings. One property of these embeddings is they allow us to use linear algebra and basic arithmetic operations to quantify relationships that would be otherwise difficult to articulate. In our case, we can take a user’s embedding and evaluate dot products with every live stream host embedding and order them based on their values to achieve a personalized rank of livestreams by hosts. If this is done for each user, the system is left with a set of candidate livestream hosts per user which can be used as a look up for fast serving.

In combination with the desired representation, our other main modeling criteria was choosing an objective function which fit our use case. Due to the constantly changing nature of live streaming, we quickly discovered that models optimizing for offline performance on rank placement, such as Precision and Recall @ K, NDCG, Hit Rate, etc… were not very robust over consecutive day runs. This was due to the high variability of host regularity on the platform and streaming schedules. Offline data quickly becomes less comparable with online data in fast-moving situations like ours, as it’s rare that the available candidates for ranking are available in the same way for training and inference. This contrasts traditional e-commerce or content sites where the catalog isn’t changing relatively quickly. We eventually decided models which optimize for offline ROC AUC performance would likely be a better proxy for online performance. In essence, models optimizing for ROC AUC evaluate if hosts the user has already interacted with are ranked over hosts the user has yet to interact with, thus learning a representation correlated with users preferences. This contrastive [1] pairwise point of view is intuitively what is important when a list of items to rank is extremely long or changing often. Given that many hosts are not live simultaneously, it’s more important there is an intuitive feeling that the shows are personalized than if the exact Top 8 slots are ordered perfectly, yet no content is consumable right away.

We investigated several methods to accomplish our objectives. Such approaches included various flavors of SVD including timeSVD++[2], alternating least squares weight regularized matrix factorization (ALS-WRMF) [3] as you would find in SparkML [4], variational autoencoders (VAE) [5], sequence models such as skip-gram negative sampling (SGNS) aka word2vec [6], as well as closed form solutions such as EASE [7]. There is also a wide plethora of deep learning methods over graphs and sequences which we began to explore, but opted to come back to later. After the investigation though, we landed on starting with matrix factorization approximations to represent the user-host embeddings using Bayesian Personalized Ranking (BPR) [8] and its variant Weight Approximate-Rank Pairwise (WARP) [9]. BPR is a tried and tested approach with many readily available implementations [10][11]. It also directly optimizes a likelihood function correlated to our evaluation metric. The original BPR paper does a great job explaining the relationship between its loss as a proportional approximation for ROC AUC. I encourage you to read more there if interested. The difficulty then comes how to construct the training dataset and how to handle the sparse nature of pairwise positive interactions when compared to the large number of unknowns. This problem is slightly alleviated if you have a principled way to evaluate the value of positive-negative pairs, as is the case with WARP, and control for the max number of negative samples to mine each iteration of training.

BPR Maximum Posterior Estimator for Personalized Ranking

X(u,i,j) is the model estimated user preference of item i over item j.

Evolving User Preferences

Given a user and their interaction signals, we can recommend hosts and therefore shows we believe that person should and would interact with. Sounds great right!?! While this will work out of the box it has its drawbacks. If you reflect on the recommender systems you interact with everyday we intuitively understand our taste and preferences change and evolve over time. We have all been in that situation where you are on an app and you broke from our normal pattern to view a random item or video clip. The next time you log onto that app, it is recommending items similar to that frivolous thing nonstop and you wish you would have never clicked that link! This manifests because most of these models infer your taste as a vanilla weighted aggregation of your past interactions.

To better understand these evolving preferences, we can look at the user’s ordered host interactions as a set of serial vector transformations aka trajectory. If we use a dimensionality reduction technique such as UMAP [11], we can visually represent these higher dimensional vectors inside a 2D plot.

For the rest of the post, we have picked out an anonymous user and their past purchases as seen in the table to illustrate some of the core concepts.

Anonymous User Purchase History

We can draw a line in latent space and approximate the trajectory given these past sellers. The dark black dot is the user’s embedding projected into latent space for comparison.

As a side tangent, we see the pairwise linear approximation of our user-host preferences is a fairly good representation based on the clustering and adjacencies. Each dot stands for a host while the color associates a host to their primary livestream category. This rich, infused representation is learned through implicit signals between users and hosts in livestreams.

User-Host Latent Space with User Embedding & Purchase Trajectory Overlay

Above is our user who started out in sneakers, but made their way to vintage clothing over the course of 4 purchases across 3 sellers.

Re-examining the user black dot relative to the purchase trajectory, one can see that our intuition that a user is the weighted aggregation of their past interactions is fair. Their representation is now closer to the vintage clothing cluster of sellers than the sneaker one they started in. When someone leaves the local neighborhood of their historical taste to explore, it pulls that user’s representation in the direction of that new space accordingly. While sometimes this is a benefit, this could lead to a poor recommendation experience for collector communities on Whatnot that have specific interests. Even though there will always be a trade off between explore and exploit modes of operation, we can use these trajectories to provide some guard rails to our recommendations by explicitly encouraging cross-category discovery and deep community experiences depending on the user and context.

Latent Space Guard Rails

There are many ways to leverage the rich information captured by these embedded latent interaction vectors. In this section we will just describe a few interesting ones as motivation. For the same user as before, here is a list of recommended hosts with their category. We see that despite starting in sneakers their recommendations are heavily weighted to the majority of their purchases which became vintage clothing. Feature? Bug? It really just depends.

User Sorted Hosts based on Embedding Dot Product

Another way to serve the recommendations would be to leverage the full implicit transaction history, T. One approach would be to take each user interaction with a live host, and take the host representation and its K ordered nearest neighbors to generate T x K candidates. We then can use a merge sort mechanism based on a voting consensus algorithm to derive a single sorted list. This is equivalent to averaging or weight averaging the host embeddings as an alternative representation to the learned user embedding. This has the nice property of being robust to outliers if a user wandered off their normal path. The sensitivity to this approach is baked into the voting mechanism/weights you choose.

Another similar method would be to take the same interactions and fit a (sp)line through the trajectory and find other points closest to the line. If the line passes through a portion of the latent space that has smooth transitions i.e. no major peaks or valleys, it can be a nice way to catch hosts users “should” have interacted with along their journey. This can be very useful for intra-category recommendations where users have anchored on one side or another of the category-host space. That being said, the assumption made earlier about valleys and peaks is very sensitive to the model, data, and training process.

Lastly, one can leverage the interaction values themselves and reposition the live host embedding we use as the point of reference for nearest neighbor search as a multi-armed bandit (MAB) Thompson sampling round. Each host is a bandit and the reward distribution to sample is constructed from the historical interactions. This approach has the nice property of mostly selecting the host the user is highly engaged with, but bakes in some randomness based on how heavily concentrated those interactions are and how many hosts a user interacts with overall. This is exploration but with principled boundaries in the sense that even a low probability host is someone the user has interacted with guaranteed. Below is an example of our same user but if the sneaker host was randomly sampled versus the other vintage seller which was purchased from twice as much.

(Left) Vintage Clothing Host Nearest Neighbors; (Right) Sneaker host Nearest Neighbors

More to Come

Past interactions are just one aspect of contributing to a user engaging with a particular live host and their shows. Future work will be to represent more distinct entities and relationships so we can improve the relevance and timeliness of our recommendations.

If you are interested in solving problems like these and much more, please reach out, we are hiring!

--

--