Playing with Losses For Search Rank Personalization

Published in

Nerd For Tech

8 min readApr 23, 2023

The objective of any search algorithm can vary depending on the platform at hand. While some platforms aim at increasing website engagement (e.g. clicks and time spent on news articles that are being searched), others aim at maximizing conversions (e.g. purchases of goods or services that are being searched over), and in the case of two sided marketplaces we often need to optimize the search results for both sides of the marketplace, i.e. sellers and buyers. The two sided marketplaces have emerged as a viable business model in many real world applications.Example industries include accommodation (Airbnb), ride sharing (Uber, Lyft), online shops (Etsy), etc. Arguably, content discovery and search ranking for these types of marketplaces need to satisfy both supply and demand sides of the ecosystem in order to grow and prosper.

Let’s discuss Challenges or Cases to incorporate in solutions

Needs to optimize for host and guest preferences
In cases like Airbnb user rarely consumes the same item twice and one listing can accept only one guest for a certain set of dates.
Serve user’s Short-term and Long-term interests.
Adapting Training for Congregated Search.
Effectively using Conversions sessions.

Since users typically conduct multiple searches before purchase or booking, we can use these in-session signals, i.e. clicks, contacts, etc. for Real-time Personalization where the aim is to show to the user more of the items similar to the ones we think they liked since staring the search session.

To be able to calculate similarities between items that user interacted with and item that need to be ranked item embeddings are mostly used , which are low-dimensional vector representations learned from search sessions. We leverage these similarities to create personalization features for our Search Ranking Model and to power similar item Recommendations.

Generally recommendation happens in two phases:

Candidate Retrieval: It built to narrow down your search to few hundreds of item. We trade off precision for efficiency to quickly narrow the search space ,You can consider this as fast but coarse where artifacts needed are Train Embeddings and Aproax Nearest Neighbour Index
Ranking : Its slower — but more precise — step to score and rank top candidates. As we’re processing fewer items (i.e., hundreds instead of millions) artifacts needed here is Trained Ranking Based Model.

Coming back to specifics of this blog we will be concerned with Train Embeddings for Items and how we can alter losses and design embedding space to conquer above listed challenges or cases.

Item- Embeddings

Define Session: Its uninterrupted sequence of M item ids that were clicked by the user. A new session is started whenever there is a time gap of more than t minutes between two consecutive user clicks. t normally is 30 mins or an hour.

Let us assume we are given a set S of s click sessions obtained from N users, where each session s = ( I1,I2,I3….IM ) ∈ S , I represent Item and i ∈(1,2,….M)

We can break down the click sessions set S into

1) Purchase sessions, i.e. click sessions that end with user purchasing a item.

2) Exploratory sessions, i.e. click sessions that do not end with purchase, i.e. users were just browsing.

Given this data set, the aim is to learn a d-dimensional real-valued representation v ∈ R^d of each unique item Ii , such that similar item lie nearby in the embedding space.

Once you have Item Embedding you can generate I2I recommendation using retrieval.

Skip-Gram Model is widely preffered to learn the imbeddings due to its nature to capture co-occurrence information. its objective is to predict context items given central item.

The objective of the model is to learn item representations using the skip-gram model by maximizing the objective function L over the entire set S of search sessions, defined as follows

The Probability P(Ii+j|Ii) of observing Item Ii+j from contextual neighborhood of clicked item Ii is defined by softmax probability term defines below

The softmax becomes bottleneck for training if the size of V(size of unique Items) is large. To overcome and reduces computational complexity this there are methods proposed in literature like negative sampling, contrastive estimation and hierarchical softmax. Negative Sampling is most widely used approach and we will be discussing on losses based on it. I will cover all above mentioned approaches in detail in seperate blog.

Let’s define the basic skip gram loss using negative sampling:

We generate a set Dp of positive pairs (I,c) of clicked items I and their contexts c (i.e., clicks on other items by the same user that happened before and after click on item I within a window of length m), and a set Dn of negative pairs (I , c ) of clicked listings and n randomly sampled listings from the entire vocabulary V. The optimization objective then becomes

Modifying your Loss to Incorporate your Solution to your Challenges

Now let’s talk specifically on how the skip gram losses are modified to solve the above mentioned challenges one by one

Effectively using Conversions sessions:

As we know that there are exploratory sessions and Purchase sessions, while training the embedding network both kind of sessions add contextual information but to emphasize a session as purchase one adds a valued information, it can be used to adapt the optimization such that at each step we predict not only the neighboring clicked items but the eventually purchased item as well. This adaptation can be achieved by adding purchased item as global context, such that it will always be predicted no matter if it is within the context window or not. Below is modification at the loos you need to do while training on Purchased session for exploratory session the general skip gram objective is used.

2. Adapting Training for Congregated Search :

Generally in few applications like travel or booking hotels people generally search with in some nearby regions(near market) which can cause imbalance in positive and negative item corpus like Dp will consists items with in market while as Dn are randomly sampled they consists item across markets. So this imbalance creates usually low with in market similarity learning. To address this issue we propose to add a set of random negatives Dmn , sampled from the market of the central item I, Modified loss the shows up as below:

3. Serve user’s Short-term and Long-term interests:

Using click sessions are very good at finding similarities between items of the same market. As such, they are suitable for short-term, in- session, personalization where the aim is to show to the user items that are similar to the ones they clicked during the immanent search session.

However, in addition to in-session personalization, based on signals that just happened within the same session, it would be useful to personalize search based on signals from user’s longer- term history. While some cross-market similarities are captured in items embeddings trained using clicks, a more principal way of learning such cross-market similarities would be to learn from sessions constructed of items that a particular user purchased over time.

Specifically, let us assume we are given a set Sb of book- ing sessions obtained from N users, where each purchased session s(pur) =( I1,I2,I3….IM) is defined as a sequence of items purchased by user j ordered in time. Attempting to learn embeddings for each item id using this type of data would be challenging in many ways:

First, purchased sessions data S(pur) is much smaller than click sessions data S because bookings are less frequent events.
Second, many users purchased only a single or very few item in the past and we cannot learn from a session of length 1.
Third, to learn a meaningful embedding for any entity from contextual information at least 5 − 10 occurrences of that entity are needed in the data, and there are many item ids on the platform that were purchased less than 5 − 10 times.
Finally, long time intervals may pass between two consecutive purchase by the user, and in that time user preferences, such as price point, may change, e.g. due to career change

To address these very common marketplace problems in practice, we normally learn embeddings at a level of item types instead of item id.So to recommend users these item categories items we need to place user as well as item type embedding in to same space and to tackle user changing behaviour we can similarly learn user type embedding instead of user id.

In practise we need to create a mechanism to map user id — -> user type as well as item id — -> Item type, This can be easily done with clustering based on some critical fields like prices, spending, purchase item details and help of domain knowledge. Once you have these mapping ready you convert the purchased sessions into User Type as well as Item Type sequences and then used the similar objective to train skip gram model. We need to keep in mind that in this case we are embedding user type and item type both in same embedding space so we need to frame our input sequence as tuple of (Item Type,User Type) and the network node will consists of all unique Item Type as well as User Type.

Session data looks like

s(pur) = (utype1 Itype1,…,utypeM ItypeM) ∈ S(pur)

s(pur) is individual sessions having purchases associated with them. Now let look at loss function to update usertype and Item type

4. Needs to optimize for host and guest preferences:

Explicitly In Two Sided Marketplace you need to generate recommendation not only in perspective of Users but Hosts too, So to take care of their behaviour we can check their Explicit Negatives for Rejections to user as a feedback. Host rejections can be utilized during training to encode the host preference signal in the embedding space in addition to the user preference signal. To formulate it more precisely In addition to sets D(pur) and D(neg), we generate a set D(rej) of pairs (ut,It) of user_type or Item type that were involved in a rejection event.

Below is loss formulation on top of the skip gram model:

Similar updates can be performed for Item Type

This is how a network embedding both user type and item type would look out:

Skip Gram Model For User and Item Type Embedding

Conclusion:

We have discussed how we can adapt our loss function with task at hand in regards of skip gram model with negative sampling approach to achieve better and effective embeddings to do recommendations.

Thanks

Playing with Losses For Search Rank Personalization

Item- Embeddings

Modifying your Loss to Incorporate your Solution to your Challenges

Written by Rahul Kumar