Cracking The YouTube Algorithm

Check out this article to know more about how the YouTube recommendation system works!

Swaathi Sundaramurugan
SFU Professional Computer Science
11 min readFeb 12, 2022

--

Authors: Farhan Hussain, Smitha Kolan, Swaathi Sundaramurugan

This blog is written and maintained by students in the Master of Science in Professional Computer Science Program at Simon Fraser University as part of their course credit. To learn more about this unique program, please visit {sfu.ca/computing/mpcs}.

Have you ever wondered how YouTube always lines up the videos that you want to watch? You might have just discussed a video with your friend, then boom! It’s on your YouTube homepage. Sometimes you discover an awesome YouTube channel and you binge-watch it. Whenever that creator releases a new video, YouTube is like, ‘Here you go, you might want to watch this too’. When you are watching a video, YouTube displays the related content as suggestions and you can watch them at ease. With the rise in the video content consumption rate, YouTube has become one of the largest online video platforms with over two billion users. With more than 500 hours of videos uploaded to YouTube every minute, the platform doesn’t fail to fascinate us with the best recommendations. So, how does it handle this huge volume of videos and users? YouTube has a sophisticated recommendation system that focuses on satisfying the users with the content that they prefer to watch.

So What Is A Recommendation System?

When you want to watch something, you ask your friends for suggestions. Your friends would know about your preference in genres and content, and they would suggest something good for you to watch. You might also like their recommendation and keep going for more. A recommendation system does the exact thing but in a more structured way. You might wonder how does this system knows your preferences even though you have never explicitly told it. It learns from your behavior on YouTube and makes a note of it. The system analyzes; the videos you have watched, the searches you have made, your channel subscriptions, shares, likes, dislikes, and many more factors to predict what you might like to watch next. So let’s find out how this works.

General Recommendation System Overview

There are a lot of recommendation systems out there. In general, they filter out some content from the overall database to satisfy the user’s needs. Normally recommendation systems use a technique called Content-based filtering to pick items based on the previously viewed content of the user. Alternatively, they use a technique called Collaborative-based filtering to recommend the content depending on the preference of a similar user group. Unlike these traditional systems that use Content-based filtering or Collaborative based filtering, YouTube does not solely rely on a single method because of the enormous volume of videos involved. The viewership of a video considerably depends more on recommendations than on channel subscriptions and manual search. Hence, the platform employs multiple techniques to carefully counterbalance new content and the existing popular videos to deliver a level-playing field for content creators. YouTube’s recommendation system consists of two sub-systems that perform candidate generation and ranking.

Recommendation system architecture from Whitepaper Deep Neural Networks for YouTube Recommendations

Candidate Generation

YouTube aims to personalize the video suggestions by employing the candidate generation framework. This framework selects a few hundred videos from the entire database for further processing based on several elements that revolve around the users. This sub-system not only recommends content related to user preferences but also considers that the user might want to watch viral videos. Before diving into factors that determine candidate generation, it is crucial to understand where the recommendations show up.

Home, Suggested & Search

There are three different places where a video will show up on YouTube. The majority of videos consumed by users are through the Homepage and Suggested. This is where the YouTube recommendation system shows you exactly what to watch (instead of you searching for it).

The third area where people can get recommendations is by searching for it. YouTube recommendation engine works slightly different when it comes to search. Since Google owns YouTube, some of the search-based rankings are taken from the Google search engine. This is where the YouTube algorithm also takes into other factors such as the title and description of the video while matching it for the keyword(s) that are being searched.

The number of candidates and the factors contributing to them is different for the Homepage and the Suggested section of a page. The landing page receives the majority of the traffic. Several factors are taken into consideration to deliver a great deal of content for the Homepage. Hence, the generated candidate pool for the Homepage is comparatively bigger than the Suggested section. The prospects are nominated based on the user’s history and the preference of similar users. The framework implements a Collaborative-filtering technique to perform the role of finding the homogenous user group. For the Suggested section, the candidate pool is refined based on what the viewer is currently watching.

Factors

YouTube optimizes its sub-system especially in reliance on the following attributes:

Watch history: The IDs of the watched videos over time.

Search history: The keywords (tokens) used for searching.

Demographic features: The age, location, gender, type of logged-in device, and so on.

Channels: The subscribed channels of the user.

A YouTube Homepage

Ranking

YouTube’s ranking algorithm is critical to the overall health of the YouTube ecosystem. ‘The Algorithm’ as it’s sometimes called is actually a misnomer, it is in fact a multitude of different systems working together to form the overall recommendation engine.

Over the years the recommendation system has changed numerous times. From the year 2005 to 2011, the recommendation system was optimized for clicks and views. This led to exploitation of the system, which resulted in the famous word ‘clickbait’. A technique that makes a person click on a video simply based on thumbnail and title, is also sometimes referred to as the shocking factor. This lead to low-grade videos where a person would simply click on a video and then would no longer watch it.

From 2012 onwards YouTube ranking system focused on a different metric known as the ‘watch-time’. This is a factor where the system looks at how long a person has watched a video and recommends that video to others. This was based on the factor that a longer watch time represents a more enjoyable experience. This was the main deciding factor in the recommendation engine until 2015.

The present-day ranking system applies a multitude of different parameters that optimize for satisfaction and personalization. Watch-time still plays a critical role in ranking but YouTube believes ‘not all watch-time is considered equal’.

Factors

There are magnitudes of factors that YouTube incorporates in its ranking system for videos. Each video gets a certain rank based on the following important factors:

Watch-time: The amount of time a video is watched.

Average view duration: Average watch-time per view.

Audience retention: The average percentage of a video that people watch.

Impression Click-Through-Rate (CTR): Measures how often a video is watched after seeing an impression.

Watch together: Videos that are generally watched together.

Survey: How a user feels about a certain video. This input is generally an adjective.

Shares: How many times the video has been shared to different platforms.

Likes & dislikes: How many likes and dislikes a video receives. This parameter might have a lesser impact now since the removal of the dislike count.

There are other factors that can influence a video’s ranking which is based on the personalization prediction model. These are location, user history, user watch history, clicking on ‘not interested’ on a video, and more.

Model Architecture

The candidate generation subsystem acts as an extreme multiclass classification prediction problem with millions of classes. It implements a Deep Feedforward Neural Network that accepts low dimensional vectors as signals. A Feedforward Neural Network consists of densely connected layers where inputs and weights influence the next layer. The features of the model, such as video IDs of the user’s watch history and words from the search history, are large inputs stored in sparse vectors. These sparse vectors cannot directly be fed into the network, therefore they are embedded to lower dimensions.

Model Architecture of Candidate Generation from Whitepaper Deep Neural Networks for YouTube Recommendations

Before 2015, YouTube employed a matric factorization approach in order to train its model with only users’ watch history. This approach is redefined in the current system to accept generalized inputs, i.e., continuous and categorical features. The viewer’s demographics such as gender, age, logged-in state are a few examples of continuous features. While embedded search and watch histories are considered to be categorical features. These inputs are passed on to four densely connected ReLU layers. During training, the inputs are then forwarded to a softmax layer that classifies the videos into different categories.

Apart from the regular user-related features, the system also integrates several other aspects into features and training sets based on certain cases.

  • The user’s behavior changes from time to time. A person may be interested in a particular topic (say SQL interview questions) when they are preparing for an interview. But once they bag an offer, they might not prefer to watch the interview-related videos and might be inclined to watch funny videos to cool off their stress. YouTube understands its significance and thus feeds the age of the watch history as a feature.
  • YouTube not only includes the watch history from its own platform but also adds training examples from other sites that have embedded videos. This approach improves the probability of the newer content being discovered.
  • The subsystem also generates only a fixed number of training examples per user. It results in users being weighed evenly by the loss function. This is done to provide an equal opportunity for all content creators videos’ despite some already having higher popularity.

During testing, the subsystem generates a pool of candidates that is fed into the ranking subsystem. The ranking model is similar to that of the candidate generation model. It is also a Deep Feedforward Neural Network that accepts both embedded categorical features and normalized continuous features.

Model Architecture of Ranking from Whitepaper Deep Neural Networks for YouTube Recommendations

Candidate generation provides a list of videos that cannot be efficiently compared with the given metrics. Hence, the ranking algorithm includes hundreds of more features to rank the candidates for users’ preferences.

  • The language preference of the user and the language of the videos are tokenized inspired by a Natural Language Processing (NLP) model called the Bag of Words language model. These tokens are further reduced to lower dimensions and embedded as a categorical feature.
  • The subsystem includes the IDs of videos that have view impressions on the users. These impression video IDs combined with the video IDs from the user’s watch history are embedded as a categorical feature and are fed to the system.
  • Similar to the candidate generation, the age of the watch history is considered a feature. YouTube also keeps track of the impression history, and so the age of impressions is also considered for the ranking. These continuous features are normalized before training.

These inputs are passed on to three fully-connected ReLU layers. During training, the inputs are then forwarded to a model that’s trained with weighted logistic regression under cross-entropy loss.

One Algorithm To Rule Them All

By now you might have gotten an idea of how complex the entire recommendation system is, and likely so, considering about 30,000 hours of video content is being uploaded every hour. To rank at such a scale is no small feat. Having said that, understanding the YouTube algorithm is the key to success or to put it more frankly, it is simply the money printing machine. Wherever the eye of Sauron… I mean where the YouTube algorithm points to, that video does tremendously good thus generating large amounts of money. But there are a lot of things that both YouTube and content creators have in common.

Almost every content creator wants their content to be viewed by as many people as possible and at the same time, YouTube wants to show the best content to as many people as possible. The goal for both of them is the same; to have the best content out there. Let’s take a look at how both content creators and YouTube itself can improve to create a richer experience for all.

Content Creators

Although there is no one-size-fits-all model for creating viral videos on YouTube. There are some factors that we can take into account on deciding what kind of trajectory to follow for long-term success on YouTube. Here we will list some of the points that a content creator can take.

  • Search for keywords that have high search volume with low competition.
  • Focus on videos that have enticing titles and thumbnails, thus increasing the CTR.
  • Try to make videos that other people are talking about, so the video gets recommended in the Suggested section.
  • Improve watch time and user retention rate by making the content more engaging.
  • Add keywords in the title and description for a higher rank in search.

Figure showing different retention types. Greens are positive indicating higher retention resulting in more views.

Figure showing different retention types. Greens are positive indicating higher retention resulting in more views.

YouTube Recommendation

One of the biggest complaints regarding the YouTube recommendation system is that it focuses too heavily on getting people to stay on the platform, instead of leaving with a richer experience. Although YouTube has mentioned that they are focusing more on satisfaction but it is nowhere near what it should be.

Let’s take a look at some of the factors that can have a major impact.

  • Watch time parameter should have less weightage when recommending videos on Homepage and Suggested sections.
  • Misinformation especially when it comes to news from low-authoritative sources should be reduced. This could be done by giving higher authoritative sources, such as world-renowned journalists a boost in ranking. This should only be applied to news-related content.
  • A new section could be created to showcase content from new channels, this will encourage new content creators.
  • Give creators a better understanding of how videos perform and not just rely on YouTube analytics.

To some extent, YouTube is already implementing these factors but they are not as prominent as they should be.

Conclusion

Recommendation systems, in general, are pretty complex, and when one talks in the scale of YouTube, you soon realize the task at hand is enormous and that is why YouTube has employed numerous deep learning models to handle curated recommendations. YouTube provides incredible insight in the form of analytics not just for itself but also to the millions of content creators who rely on YouTube for a living. From personalized insights to cat videos being recommended, YouTube pretty much has it all. No wonder YouTube is the world’s largest video content creation platform and it shows no sign of slowing down.

References

--

--

Swaathi Sundaramurugan
SFU Professional Computer Science

Data Engineer Intern | Graduate Student at Simon Fraser University | Full Stack Developer | Writer