Behind VRV’s Recommendation Engine

Authored By Gizem Erdogan & Bharadhwaj Narayanan

Crunchyroll
Crunchyroll
5 min readMar 18, 2020

--

At VRV we use personalized recommendations extensively to help our users discover their new favorites among thousands of other movies and series in our catalog. We believe a personalized homepage tailored to each user’s likes and interests is a big part of super-serving our fans. This also paves the way for our users to explore the rest of our catalog. That is why, when users log into VRV homepage, they are welcomed by personalized feeds with their recommendations, watchlist and continue-watching panels, along with generic content like new releases and fan favorites. Our two main types of personalized recommendations are what we call Top Picks for You and Because You Watched.

Top Picks For You

In Top Picks for You, we use user-based collaborative filtering, where we utilize an Alternating Least Squares (ALS) model to generate recommendations based on a user’s watch history and similar decisions of other viewers. We derive a rating for each show a user watched based on our estimations of how much the user likes the show, and create a user-media association matrix. The missing entries in this association matrix are filled by ALS matrix factorization, which describes users and media by a set of latent factors that predict missing entries when multiplied back together.

In our model, we use implicit user feedback derived from user’s viewership patterns, rather than an explicit grade given to the show by the user. The main reason for this choice is that the explicit feedback such as grades or watchlist saves are less frequent actions that are not taken uniformly by all the users. Using viewership based events and estimating implicit ratings enable us to capture information from all our active users. The estimated ratings represent the strength of observed interactions in user’s watch history. Hence, instead of modeling the ratings directly, ALS model tries to find latent features that can be used to explain the preference of a user for a show.

We built our recommendation model using the spark.ml implementation of ALS, which allows us to specify implicit feedback requirements and takes care of finding latent factors with its tunable hyper-parameters. We selected the most relevant set of parameters through multiple iterations of model training and cross validation. The number of latent factors in the model (rank) was the most decisive parameter for our dataset and setting it reasonably high yielded the most optimal results according to our business requirements.

Because You Watched

In Because You Watched, we use a hybrid of item-based and user-based collaborative filtering, where we cluster together users with similar likes and shows with similar features to generate a weighted similarity score.

We use cosine similarity to group the shows based on features derived from show’s metadata and combine it with the collaborative score derived from the combined viewership of a show. In this way, we produce a similarity score for every pair of shows in our catalog. When we identify a user-media association, where a user likes a certain show, we present them with a list of “Because You Watched” shows catered to their taste. We display multiple such panels sorted by how recently each show was played. For example, in the above panel, “Gary and His Demons” was the most recent series the user showed an interest in.

Ensuring Quality in Recommendations

An important step of recommendations serving is the ranking of the candidate media. Recommender systems are prone to popularity bias and suffer from cold-start problems. In other cases, some candidate media in recommendations might already be watched by the user to some extent and not liked well enough to be continued again. We tackle these issues by either setting weights for features before feeding them into the model or by updating the model output according to our predefined business rules. To enable exploration, we also insert a small percentage of “undiscovered” media into the user’s recommendations. Explorative content is not generated by the model, but rather randomly selected from a pool of shows to be introduced to the user for the first time.

In order to validate the model results, we defined a number of validation metrics. A few examples are:

  • The effective catalog size, which defines the percentage of catalog contained in the recommendations
  • The frequency of occurrence of each show within recommendations
  • The distribution of common recommendation counts between random user pairs.

Most Popular

Another product of the recommendations engine is our Most Popular panel. To identify our most popular shows we created a weighted score, which consists of two parts: A recency score, which measures how many users watched a show in a given timeframe and a keen score, which measures the demand for a show immediately after it airs.

Technical Stack and A/B Testing

We implemented our recommendation pipeline using a comprehensive tech stack. To manage our workflow orchestration, we opted for Apache Airflow. For handling data ingestion and processing in near-real time we use Spark Streaming. As for our feature stores we use Amazon DynamoDB and AWS ElasticSearch service. We update our model daily with batch data using PySpark on AWS EMR, and deploy the model to production using MLeap. For tracking the model performance and experiment metrics, we utilize MLflow. Here is an overview of our design:

We had multiple iterations of our recommendations algorithm as well as the underlying infrastructure. For model updates, we used A/B testing to ensure that each model was statistically significantly better than its predecessor. Our infrastructure updates over time required significant changes such as switching user profiles storage from AWS ElasticSearch to DynamoDB and pre-calculating and storing sets of recommendations vs generating them on the fly. With every infrastructure upgrade, we made sure there was no statistically significant difference between A/B tests.

Next Steps

In our never-ending quest for super-serving our fans, of course we won’t stop here! With the implementation of new event collectors and near real-time data ingestion pipelines, we have access to more diverse and direct feedback from our users. We are in the works of implementing a new deep learning-based recommendations model, which can learn user interactions in a more meaningful manner for an even more personalized experience. Stay tuned!

--

--