Youtube’s Recommendation System and Impact of Confirmation Bias

Andrew Gough
Jan 18 · 6 min read

YouTube is home to one of the largest and most sophisticated industrial recommendation systems in existence. Whether we like it or not, most of our experience on the web is shaped by these powerful recommendation systems, providing us with content that we are predicted to engage with and consume. But exactly how different is each users’ interactions and view of the internet due to these personalized algorithms. The main features of YouTube’s recommendation system is that content is algorithmically generated for the purpose of users and is constantly learning. Both of these features can lead to confirmation bias in users. This post will consist of a dive into the technology underlying the system and where interesting problems may arise in the system design. I will examine these problem points through the lens of confirmation bias.

What is YouTube?

Aside from the largest collection of cat videos on the internet, it originated as the first widely adopted video uploading and streaming service on the internet.

YouTube Fun Facts

  • Purchased for $1.65 B in November of 2006
  • 300 hours of video are uploaded a minute
  • 5 billion videos watched per day
  • 108 million hours of video watched per day

Recommendation Systems

Industrial recommendation systems are a step up from standard recommendation systems and sit at the intersection of three big fields.

Big Data + Machine Learning + Human Computer Interaction

YouTube’s industrial recommendation system is dealing with one of the largest and fastest growing datasets in the world. The system’s goal is to try and present the user with a few videos (that have been selected out of billions) that the user has the highest probability of clicking on. To predict these videos the system needs to account for historical user data and in general know how humans interact with its services.

Recommendation Systems and Human Computer Interaction

  • Amazon — Product Recommendations
  • Google — Web Page, Image, News, etc…
  • Netflix — Movie & TV Show Recommendations
  • Yelp — Restaurant Recommendations
  • Youtube — Video Recommendations

The human computer interaction lens is necessary to study how these algorithms dictate your online experience and perception. Take Amazon for example, in a normal grocery store, when you walk through the aisles products are placed on the idea that most of the general public needs, wants, or will buy the item. Amazon is similar to a normal grocery store, but each aisle and shelf is customized per user. These IRSs are setting the shelf of each aisle you walk down dynamically, and placing each item according to an estimated probability of you purchasing it. Curated to your exact taste, based on a large amount of demographic, personal, and historical data collected on you.

YouTube Recommendations: System Overview

Deep Neural Networks for YouTube Recommendations

Google treats the recommendation system as an extreme classification problem. The system consists of two deep neural networks. The first network does the job of candidate generation, which. The second network does the job of ranking the output piped to it from the candidate generation network.

The candidate generation network takes as input, YouTube’s video corpus (in the order of hundreds of millions of videos) along with user history and context. This network will filter down the video corpus to just hundreds of potential videos to recommend.

The ranking network has three input sources: the output of the candidate generation network, deeper video features about those candidates, and other candidate sources. The output of the ranking network is sorted from highest to lowest, a normalized output from 0 to 1 which represents the prediction of how likely that user is to click on the video in question. The output is in the order of dozens of videos, and of these videos the top ’N’ will be recommended to a user depending on where they are in the website.

Towards the end of the blog post, we will revisit the candidate generation network architecture and examine it for potential vectors where confirmation bias can be introduced at a design level. In the next section I will introduce the concept of confirmation bias.

Confirmation Bias

Above we have two sets, one is “What the facts say” and the other is “What confirms your belief”, confirmation bias exists within the intersection of these two sets. This intersection, as the diagram shows, tends to be information that is overvalued for what it is worth. Whereas information in the first set that does not confirm your beliefs, tends to be undervalued. Finally, information that confirms your beliefs, but does not intersect with fact, is purely foolish information to base a belief on.

Confirmation Bias in YouTube

The candidate generation system is designed to take as inputs the user’s watch vector, search vector, geographic info, age, gender, and a host of other features denoted by ‘…’ in the diagram below. This already filters down the potential recommended videos to a subset of all videos, and this subset will conform to the users preferences and will likely not challenge their views on anything. The whole point of these recommendations is to output videos that the user is likely to click on, to ultimately serve them more advertisements. Working with the incentive to serve more ads puts these recommendation systems in an ethically murky area where they can manipulate confirmation bias in users, encouraging them to engage with videos that agree with their current point of view and are somewhat factual.

YouTube’s Candidate Generation Network Architecture

Consequences of Confirmation Bias

Conclusion & Discussion

Resources

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Andrew Gough

Written by

Eclectic Software Engineer

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Andrew Gough

Written by

Eclectic Software Engineer

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store