Personalized buyer listings at CARS24 — an overview

Published in

CARS24 Data Science Blog

8 min readOct 30, 2023

Introduction

CARS24, India’s leading online preowned car buying and selling platform, is constantly innovating to enhance user experiences. Every month millions of users visit our platform in search of their dream cars. In this blog, we delve into our journey of refining our personalized car ranking models for search & listings through the power of machine learning.

There are more than ~10k Cars listed Pan India on CARS24 platform at any point in time

Problem Statement

As CARS24 continues its rapid expansion, we place a high value on elevating the customer experience by matching cars with individual users more precisely & faster. We employ machine learning-powered search & recommendation systems to craft personalized car discovery experiences, increasing the likelihood of users discovering their ideal cars.

To achieve this, we are committed to evolving our strategies and embracing a more granular approach that caters to the diverse array of user preferences. Recognizing the limitations of our current cohort-based personalization model, we have initiated a shift towards more advanced algorithms. These sophisticated algorithms consider a comprehensive spectrum of consumer activities, including search history and detailed filter usage. This, in turn, empowers us to deliver highly personalized recommendations that align with each user’s unique requirements and preferences.

Understanding User Preferences

So far, our buyer-facing personalization stack was powered by our cohort personalization model, which was first shipped about 3 years ago and iterated upon multiple times ever since. It leverages content based filtering on top of unsupervised clustering of users.

There are two prominent pieces of the overall framework. The first one is the unsupervised clustering which uses a plethora of user clickstream data from the platform e.g. car impressions, clicks, search/filter, wishlist, image gallery views, inspection report views etc. to segment users into one of the many demand clusters. Overall, there are about ~200 of these demand clusters, concentrated over different geographical regions Pan-India. And the second one uses an array of supervised ranking models to rank listed cars for the different demand clusters. Working in tandem these two models optimize the ranking of the listed cars to maximize the likelihood of users doing a booking.

However, in this framework there’s only one catalog for all the users within the same demand cohort. So, we realize it’s time to take this experience a notch higher and serve our users with a more individualized experience.

To address this need, we’ve introduced a user level personalization framework. This framework involves creating a user profile for each customer to capture their personalized interests in cars. User preferences within these profiles are primarily inferred from both implicit signals, such as past clicks and impressions, and explicit signals, like searches, filter usage, and wishlist activity provided by users in their previous sessions. We leverage this information to curate a tailored list of personalized car options for each user.

Exploring User Preference Differences

Let’s examine the experiences of two hypothetical CARS24 users who have embarked on their car search journey.

Illustration: Users within same ‘cohort’ (mid segment SUVs) but clear difference in preferences

Both users launch our app and click on the “VIEW ALL CARS” option, which grants them access to our extensive catalogue of listed cars. They are presented with the cohort-based, sorted car listings, and this is what both users encounter.

CARS24 Listing Page for mid segment SUV cohort

As they scroll through the seemingly endless list, they have the opportunity to explore different cars and view detailed information about each one on the car detail page. However, they often find themselves spending a significant amount of time scrolling and may ultimately exit the app without making a decision on which car to purchase. This abundance of choices can be overwhelming, and the cognitive effort required to select the right car can prove to be quite a challenge.

In our pursuit of enhancing this user experience, we have taken steps to develop a system that personalizes the catalogue, allowing users to view cars that align with their preferences and signals they have provided.

How? Both Arjun and Priya have browsed, viewed and made searches with us in the past, allowing us to gather their individual preferences.

Leveraging Advanced Machine Learning Solutions

Our approach to tackling this challenge involves the implementation of an advanced machine learning system. At the heart of this system lies the Learning to Rank algorithm, a powerful tool for accurately ranking and presenting car options tailored to the distinct preferences of each user. This approach is not just about enhancing user satisfaction but also about driving increased user engagement, ultimately leading to increased user retention and heightened platform activity.

The Model

Learning to Rank (LTR) stands as the core of our solution. LTR is a machine learning technique that specializes in training a model capable of ranking a list of items based on their relevance to a specific user. This technique is instrumental in creating personalized recommendations by learning from user interactions and their preferences. The primary objective is to fine-tune the order in which items are presented, thereby enhancing the relevance and accuracy of the recommendations provided to each user.

Exploring Different Methods

Pointwise Method: In the pointwise method, each item within the training dataset is considered independently. The primary target variable is the relevance score for each item. The Learning to Rank (LTR) model learns to predict this relevance score directly for each item. This approach focuses on assessing the relevance of individual items in isolation.

Pairwise Method: The pairwise method takes a different approach by examining pairs of items and concentrating on their relative ordering. The LTR model is trained to compare these item pairs and assign higher scores to the more relevant item within the pair. It essentially seeks to establish which item within a pair is more preferable.

Listwise Method: The listwise method treats the entire list of recommended items as a single training example. The model directly optimizes the ranking of the entire list, taking into account all interactions and their respective positions. This approach is designed to acknowledge the inherent dependencies among items within a list and aims to generate rankings that are more accurate and contextually relevant.

Evaluation Metrics

Normalized Discounted Cumulative Gain (NDCG): NDCG is a commonly employed evaluation metric in Learning to Rank algorithms. It evaluates the quality of the ranking by considering not only the relevance of the recommended items but also their positions within the list. It assigns more weight to highly ranked relevant items, discounting the cumulative gain based on their positions. NDCG normalizes the gain by comparing it to the ideal ranking, making it a robust metric that accounts for varying list lengths and user preferences.

The Choice of Method

In the context of our ranking problem, the listwise method stands out as a more suitable approach compared to the pointwise method. This is because the listwise method is focused on getting the order of cars right and does not attempt to estimate relevance scores independently. The relevance scores in the pointwise method were assigned as markers to indicate preferences relative to other items. To address this ranking problem, we have implemented YetiRank using Catboost, a powerful choice to optimize the ranking of car listings effectively. This approach ensures that the order in which cars are presented aligns with the users’ preferences and expectations.

Now let’s go back to our beloved customers — Arjun and Priya.

The next time Arjun or Priya opens the CARS24 app in their quest to find the perfect car, they will be pleasantly surprised to discover that the results they encounter are meticulously tailored to their unique preferences. This transformation is made possible by our new model, which takes into account a multitude of factors. These include the frequency of their clicks and the depth of their engagement, such as examining images, inspection reports, and car features for specific criteria like body type, model, price range, age range, and more. Our model goes even further, factoring in their preferred choices, searches, and filters, as well as their interactions on the car detail pages. As a result, the app will present cars that align with their preferences and are most likely to be clicked on or purchased. This personalized approach significantly enhances their user experience, making it easier for them to find their ideal car within the vast array of options available on the platform.

The difference in catalogue listing results Arjun and Priya get after learning to rank is used to re-rank the results are shown below.

The Technical Architecture

As we shifted from cohort based personalization to user level, near real-time personalized catalog rendering, DS/ML engg teams collaborated deeply with the Tech team to make the new solution see the light of day!

We leveraged queue-based score publishing via pubsub where scores were pushed at user-catalogue level and consumed immediately on listing.

Results

We ran an A/B test using this formulation of learning to rank and observed a relative lift of ~11% in overall user conversion. We further tracked the movement in I2V and I2BI and results are mentioned below.

I2V: I2V is the number of cars viewed over cars seen. We can see from the below exhibit that the difference in %I2V in the first X ranks for the test is higher than the control group.

I2BI: I2BI is the number of cars in which booking was initiated over the number of cars seen. In the below exhibit we have compared the %I2BI of test v/s control basis deciles of catalogue position. The test group is showing higher I2BI with a higher proportion of I2BI coming from the first decile.

Next Steps

The journey towards enhancing our models is an ongoing process, and we are far from reaching the end of it. In our upcoming iterations, we are determined to implement real-time personalization building an Online Scoring Infrastructure. This exciting development will enable us to capture the in-session intent of our customers, ensuring that we can assist them in discovering the right cars faster while providing a highly personalized experience. Also, we will be taking this solution live across our international geographies (UAE, Thailand and Australia) in coming weeks.

Acknowledgement

A big thanks to Prashant Chandel, who was instrumental in making this project a reality. I would also like to express my gratitude to our Principal Data Scientist Shashank Kumar & Lead DS Paridhi for all the brainstorming and to the engineering & product counterparts Mayank Sood, Aviral & Ritwik who collaborated extensively from the ‘other end’.

We are always on a look out for talented data scientists & ML engineers who are keen to work on cutting edge / industry shaping solutions. Please reach out if interested in exploring opportunities with us.