Predicting your favourite TV Programme

Using Classification Algorithm to build Recommender System

Published in

tech@TVB

6 min readFeb 21, 2019

Over thousands of hours TV content are ready in content library. Some of them may not be easily discovered by user. Recommendations on the platform is an ideal way to provide personalized content to user and encourage user to enjoy our service. In this article, this is going to show how we use classification algorithm to build recommender system.

Viewing sequence as features

When it comes to recommender system, there is no shortage of algorithms to choose from. Collaborative filtering is a great way to start, creating a user-to-item matrix and using SVD (Singular value decomposition) or NMF (Non-negative matrix factorization) to find the similar users or similar items. While our team had achieved a reasonable prediction with collaborative filtering methods, we realized that the conventional CF (Collaborative filtering) was not able to predict very accurately when it came to viewing sequence.

Viewing sequence is important feature in video streaming services. Most users are likely to watch season 2 after season 1, that is in sequence. For example, if a user watched TV Programme “The File of Justice I” (壹號皇庭), then a conventional collaborative filtering may rate “The File of Justice II” and “The File of Justice III” equally as likely. However, we all know that the user will watch “The File of Justice II” before “The File of Justice III”. This requires our model to learn the sequence of the users’ watch behavior.

Methodology

The general structure of the recommendation engine

We use a two-step approach, referencing from this paper, to prepare a personalized video recommendation list. The first, in the candidate generation stage, thousands of videos available are narrowed down to couple hundreds. Next, in the ranking stage, the surviving videos are further ranked in the order based on a classification neural network. Finally, the top N of the ranked order will be used to serve users.

One key benefit of having a two-step approach is that we can use candidate generation stage to fulfil business requirements, which are not related to machine learning algorithm. A few examples of business requirements include, recommendation based on specific genres, special filtering rules for kids program, or excluding older videos. Candidate generation part can be rule-based or this part can be also based on other machine learning algorithms as long as it can narrow down the scope of the calculation for the ranking stage. In the interest of our time, we won’t delve into the candidate generation stage.

Step 1: Embed programmes into a vector representation

Before we feed the sequence of video into the model, we must first transform it into some numerical representation, i.e. Embedding. For example, “Game of Thrones” will be transformed into [0.1, 0.3, -0.6].

Embedding can be done as simple as using one-hot encoding. However, in order to achieve a better result by assigning a meaningful representation for the videos, we borrowed the technique of word embedding from NLP (Natural Language Processing). We treat user view histories as sentences and treat programs as the tokens.

A benefit of using the word embedding technique is that it can group videos into similar neighbours without us having to specify or label anything. Examining and evaluating the word embedding quality can be done in quantitative and qualitative ways. For example, we can compare the embedding values against similar genres to see if similar genres are grouped together.

The result looks like this after visualizing it with t-SNE (t-Distributed Stochastic Neighbour Embedding).

Few things to note about video embeddings

Since we are applying word embedding technique to calculate video embedding, there were few things we found unique to our situation.

Handling of videos that was watched in separate sessions: When constructing the video sentences, we noticed that a lot of our users completed video in multiple sessions. How to treat these examples can have a lasting impact on the embedding outcome.
Handling of new videos that do not have any watch history: When the platform operators put new videos on-shelf during the day, that video cannot have any vector values yet. In an industrial machine learning setting, this must be handled appropriately otherwise, the video may not appear in the recommendation until later, and in a business world, this can be considered a serious flaw in the recommendation product. One way to solve this to use an average of the similar genres.

Step 2: Prepare the dataset

We casted the recommendation problem into a classification problem. The idea is to train a model to predict if one program is good to be recommended given the view history of a user. Each of the P1, P2… below are the vector we created at the last step. You may just flatten the list of vectors or pass it through an RNN/LSTM layer before further process it with a classification model of your choice. Here, let me just skip the details of that and focus on the training data set preparation.

@Training time

Let’s assume there are all together 10 different programs, namely

P1, P2, … , P10

And we have view history like (In the sequence of viewing):

User 1: P1, P2, P3, P4, P5, P6, P7

First, we create our positive training samples (we labelled them as positive class because these are our actual observations):

Note the negative samples are also needed, otherwise the model will simple treat everything as positive. So secondly, we create the negative samples by replacing the last item (the candidate) by a random program:

The idea is to tell the model that if you see P1, P2, P3, it will be good to recommend P4.

For simplicity, we only classify the recommendation into two classes — good and bad. Indeed, there can be more classes, e.g. good for kids, good for adults.

@Inference time

Assume a user comes in, who has viewed P8-P9-P10 asked for recommendation.

After going through the pre-selection stage, let’s say P1, P3, P5, P7, P9 are the possible candidates. Just to recap, thousands of programs are passed into a pre-selection stage, where we select hundreds amongst. If the total number of programs is not very big, you may also ignore the pre-selection stage to treat all programs as candidates.

Next, we will create a matrix of model input in this way:

From the result, we can say P9 is the best for recommendation and P7 and P3 are the runners-up.

Step 3: Train model

You can use your choice of the model for classification. There are some of the common algorithms.

Step 4: (Optional) Tweak the model

While we all feel uncomfortable when we add additives into our model, it is good to have tuning mechanism, to make the recommendation results look more like the ones in the business owners’ mind.

Let’s get back to the preparation of training sample. Remember we created this set of training sample from user view histories.

Now, let’s assume P3b, P4b, P5b are the behind-the-scenes of P3, P4, P5 respectively, and the management wants it to be recommended too. In order to tweak the model, we can introduce some additives to the model in this way:

These samples will tell the model that it is good to recommend P3b if it sees a P3 in the last position of the view history.

If you enjoy read this post and feel you will enjoy working on Recommender System, we are looking for talented people to join the team. Please contact us on tech@tvb.com.

Predicting your favourite TV Programme

Using Classification Algorithm to build Recommender System

Viewing sequence as features

Methodology

The general structure of the recommendation engine

Written by tech@TVB