Recommended — Just For You: How We Handle Personalisation at Scale

Tech @ ShareChat
ShareChat TechByte
Published in
8 min readJan 17, 2022

Written by Srijan Saket, Aravindh R, Subham Todi, Brihati Jain

Our Mission

Ever wondered how you are able to find relevant and entertaining content every time you go to the Moj or ShareChat “Home” page? Machine Learning is at the core of this complex machinery which generates recommendations for millions of users and connects creators with their beloved fans. In this multi-part series, we’ll walk you through the nuts and bolts of this machinery and the lessons learnt in the process. Our mission is to continually enhance the user experience and to give more enriching forms of expression and enjoyment to all the different types of users.

What Do Users Want

Platforms that truly stand out and win the love of their users are the ones that give (the best possible user experience) before they ask (anything from the user). They do so by building the right levels of user empathy, technological savviness, and intelligence (both the artificial & natural kind). The Moj & ShareChat home page is a prime example of this. In order to achieve this, understanding our users and their emergent behaviors plays an important role. The following illustration provides more context around how user preferences towards information consumption have evolved over time.

As we are living in the era of recommendations, it is important to understand the following nuances:

  • Personalization — Personalization represents the core of any recommender system. More than being a destination, personalization is an enriching journey involving stages. The starting point could be more explorative because we don’t have a lot of information regarding users’ preferences on the platform. In a short span of time, the system needs to understand not just the user’s interests, but even their countless contextual preferences.The same is true for posts where we need to understand a post very early into its journey for effective targeting. The recommender system should be able to account for these factors. On top of that, modeling “multiple” user preferences for millions of users with varied interests comes up with its own set of interesting but complex challenges.
  • “Long Tail Problem” to “Long Tail Competitive Advantage” — As an example, any e-commerce store faces the challenge of identifying the unique tastes of each customer. Unless this is done properly, it’s likely to get bogged down under the long tail nature of customer preferences. Conversely, imagine an e-commerce store that not just has every item under the sun, but also magically recommends the exact item that is aligned to the customer’s need — that’s a huge advantage that recommender systems are able to unlock.
  • What More? — Does the user always know what he/she wants? Does the user always have enough time to browse/search for the next ideal thing? — Probably not! That’s another reason why Recommender Systems become essential. Thinking a step further, even when users see only relevant content, they get bored if there is no novelty — enabling users to do structured exploration of different content categories is also important. In the pursuit of finding “nearly” correct balance between exploration and exploitation we need to make sure the recommender system doesn’t pick up noise in user feedback.

And now take a step back and imagine doing all of this for all users across India! — Where the language, culture, and the entire ethos change every few kilometres — that’s the challenge that we, at Moj & ShareChat, have been solving since Day 1!

How Do We Delight Our Users

The answer is simple. By giving them what they want & by making them discover new avenues which might excite or educate them (One way of looking at it, is from the lens of the apocryphal Ford story. Give them a faster horse AND this new fangled thing called an automobile). While these aspects might sound complicated, let’s simplify them into the core first principles and the flywheels involved.

Understand the User — We do this by listening to the user’s feedback (positive signals such as watch-time and likes; negative signals such as skips and “not interested”). And the user actually doesn’t need to take any special effort/strain Just sit back, relax and consume the content they love. Every single signal provided by them, helps us gain a better understanding about them and the kind of content they like or dislike.

It is imperative to optimally understand these signals in each of the following situations:
1.Cold Start (initial phase) of the user
2.Long-standing baseline interests of the user
3.Contextual & intent-based differences in user preferences

Understand the Content — We need to understand the content through machine learning algorithms while minimizing the need for manual labeling. This involves broadly 2 approaches. One of them is to use Multimodal learning techniques to understand the inherent aspects of the content such as the visual features, speech, text etc. The other is to show the content to appropriate sets of users and learn from their interactions. Each of these approaches have their own advantages and are employed to varying degrees in the following phases that a piece of content goes through,

a) Understand the content during its cold start (post cold start)
b) Map the content accurately to the users who would absolutely love it
c) Understand the time-point when this content is no longer fresh/relevant

Guiding Principle for Algorithms — Let’s just follow the crowd and build a pure follow graph? Hmm, let’s back up a sec. What is our end goal? ‘Matching the user niches to the content niches they are looking for as well as matching the Creators to the appropriate audiences’. This is pretty much an ‘Interest Graph’. The traditional way followed by others has been to create a follow graph first (remember the ‘gentle’ prods you get whenever you go to any older social network, to ‘befriend’ / ‘follow’ people?), and then use it to weakly approximate an Interest Graph. There are many disadvantages to this approximation based approach. Consumers/Creators have to spend painstaking time and effort constructing their follow / follower graph and even then, there are problems. Can you think of many creators whose entire set of creations you like? Can this system ensure adequate opportunities for new creators? So, is there no better way to construct an ‘Interest Graph’? Turns out there is, and we have constructed it!

How it Comes Together — Multi-Stage Ranking System — If we frame this problem as finding the affinity of a user with each post (not just the posts from the creators or the genre the user is following), the multi-stage ranking system has the ability to smartly condense this problem and enable us to balance between multiple conflicting but important objectives such as precision, recall, systematic explorations across multiple content types, trending content and exploit known preferences. It also ensures that new posts and young creators are able to fairly compete in the system by ensuring that there is no ‘system’ bias in the lifecycle of a user or a post (will be covered in detail in future blogs).

There are multiple user actions being modelled during recommendation, explicit and implicit, positive and negative. These actions are combined into a final score using a fine sort layer, Additionally the feedback loop, being maintained in real time, helps in continuously improving the experience on the platform. Whenever a user opens their ‘Home’ page, it sets into motion these complex machineries.

As soon as the app is opened, a request is sent to the backend system from the client, where the feed aggregator collects posts from different candidate generators. After getting the set of potential posts, features are fetched from the feature-store which is then passed to the predictor. The predictor processes the post and ranks them based on different user, post signals and multiple contextual features. All the features are also logged which are then joined with impressions to create a feedback loop for our ranking model.

And, voila, that’s how the magic of the ‘Home’ page is achieved! We at Moj & ShareChat see this as an ongoing journey in which we learn and improve every day. As we have grown to become India’s number one short-video platform, our users have gained more confidence to explore newer, more niche and nuanced categories on our platform. This in turn energizes us to take our game to the next level. Moreover, as we continue to get better at personalization, we are cognizant and proactive about next level challenges such as, ‘Prevention of Filter Bubbles’ and ‘Enabling Serendipity’, and are setting up the right mechanisms to address them.

Ensuring the best experience! Experimentation Frameworks

There are certain business metrics like quality time spent on the app, user engagements etc. which helps us in validating certain hypotheses and incrementally improve the recommendation models. There are two ways of doing this evaluation; offline and online. We have a robust system for offline evaluation which leverages a humongous amount of data stored in our data lake. For the online evaluation, we generally use an AB testing framework which is monitored with respect to the business metrics. Performance metrics, statistical significance are some of the things which help us with the selection of one logic over another.

What’s Next?

More exciting blog posts! Want to learn more about what makes the proverbial viral video, experimentation framework, infrastructure challenges, deep-dive into how the feed is generated, how different user feedbacks are modelled. Stay tuned while we walk you through some of the toughest challenges we faced in the process of building India’s largest homegrown social media platform and how we are overcoming these challenges to serve the interests of the next billion users. For more, come join us!! For at least a cup of coffee or a pitcher of beer (virtual works too!). You never know which great journey would start from these conversations!

Cover illustration by Ritesh Waingankar

--

--