Photo by Christian Wiediger on Unsplash

Deep Neural Networks for YouTube Recommendations

Paul Covington, Jay Adams, Emre Sargin

Yoav Navon
2 min readSep 7, 2019

--

This paper presents the recommendation system used in youtube, based on a deep learning network. The model consists of two parts: first, a candidate generation network reduces de possible videos from millions to a couple of hundreds. It uses watch and search history, as well as some context information. After the candidate generator returns the videos, a ranking network computes the predicted watch time of every video extract a subset of videos. The ranking network has access to more detailed information about each video. The full model is differentiable end-to-end and is trained used a held-out method, in which a fraction of a user is presented and the “future” is held out.

The model performs extreme multiclass classification, between millions of videos. I wonder if could be possible first to predict the topic of the video, and then perform the multiclass classification. This would follow a top-down approach. Although, this method would need a model to predict the topic or the use of tags.

The authors say that they feed the age of the training example as a feature during training, but what age is this? Because the training example consists of multiple videos that the user has watched, the age is the average? or some other way of computing it.

The ranking network is supposed to predict watch time, but I think that is necessary to normalize by some factor. This is because watching 1 minute of a 1.5-minute video should be a hit, but watching 1 minute from a 30-minute video should be categorized as a negative sample, because the user didn’t like the video.

The authors say that feature engineering is still necessary, and the main challenge is in representing a temporal sequence of user actions, and how these actions relate to the user. This problem could call for the use of an RNN, to encode the sequential watch history of users.

--

--