YouTube is home to one of the largest and most sophisticated industrial recommendation systems in existence. Whether we like it or not, most of our experience on the web is shaped by these powerful recommendation systems, providing us with content that we are predicted to engage with and consume. But exactly how different is each users’ interactions and view of the internet due to these personalized algorithms. The main features of YouTube’s recommendation system is that content is algorithmically generated for the purpose of users and is constantly learning. Both of these features can lead to confirmation bias in users. This post will consist of a dive into the technology underlying the system and where interesting problems may arise in the system design. I will examine these problem points through the lens of confirmation bias.
What is YouTube?
Aside from the largest collection of cat videos on the internet, it originated as the first widely adopted video uploading and streaming service on the internet.
YouTube Fun Facts
- First video uploaded April 23, 2005
- Purchased for $1.65 B in November of 2006
- 300 hours of video are uploaded a minute
- 5 billion videos watched per day
- 108 million hours of video watched per day
Stanford InfoLab defines recommendation systems as “a class of web applications that involve predicting user responses to options”. This sounds straightforward, but when you are running on the scale that YouTube is operating on with millions of concurrent users and 300 hours of video uploaded a minute, the recommendation problem becomes a little trickier.
Industrial recommendation systems are a step up from standard recommendation systems and sit at the intersection of three big fields.
Big Data + Machine Learning + Human Computer Interaction
YouTube’s industrial recommendation system is dealing with one of the largest and fastest growing datasets in the world. The system’s goal is to try and present the user with a few videos (that have been selected out of billions) that the user has the highest probability of clicking on. To predict these videos the system needs to account for historical user data and in general know how humans interact with its services.
Recommendation Systems and Human Computer Interaction
Over time, industrial recommendation systems (IRS) have grown more robust, and more personalized. This personalization dictates the content that is shown to you on services that affect your everyday life. Listed below are examples of commonly used websites that utilize an IRS…
- Amazon — Product Recommendations
- Google — Web Page, Image, News, etc…
- Netflix — Movie & TV Show Recommendations
- Yelp — Restaurant Recommendations
- Youtube — Video Recommendations
The human computer interaction lens is necessary to study how these algorithms dictate your online experience and perception. Take Amazon for example, in a normal grocery store, when you walk through the aisles products are placed on the idea that most of the general public needs, wants, or will buy the item. Amazon is similar to a normal grocery store, but each aisle and shelf is customized per user. These IRSs are setting the shelf of each aisle you walk down dynamically, and placing each item according to an estimated probability of you purchasing it. Curated to your exact taste, based on a large amount of demographic, personal, and historical data collected on you.
YouTube Recommendations: System Overview
Below is a diagram from Google’s white paper on Deep Neural Networks for YouTube Recommendations which describes their recommendation system at a high level and focuses on the improvements brought about by deep learning.
Google treats the recommendation system as an extreme classification problem. The system consists of two deep neural networks. The first network does the job of candidate generation, which. The second network does the job of ranking the output piped to it from the candidate generation network.
The candidate generation network takes as input, YouTube’s video corpus (in the order of hundreds of millions of videos) along with user history and context. This network will filter down the video corpus to just hundreds of potential videos to recommend.
The ranking network has three input sources: the output of the candidate generation network, deeper video features about those candidates, and other candidate sources. The output of the ranking network is sorted from highest to lowest, a normalized output from 0 to 1 which represents the prediction of how likely that user is to click on the video in question. The output is in the order of dozens of videos, and of these videos the top ’N’ will be recommended to a user depending on where they are in the website.
Towards the end of the blog post, we will revisit the candidate generation network architecture and examine it for potential vectors where confirmation bias can be introduced at a design level. In the next section I will introduce the concept of confirmation bias.
“Confirmation bias occurs from the direct influence of desire on beliefs” — Pyschology Today. The image below perfectly and elegantly summarizes confirmation bias:
Above we have two sets, one is “What the facts say” and the other is “What confirms your belief”, confirmation bias exists within the intersection of these two sets. This intersection, as the diagram shows, tends to be information that is overvalued for what it is worth. Whereas information in the first set that does not confirm your beliefs, tends to be undervalued. Finally, information that confirms your beliefs, but does not intersect with fact, is purely foolish information to base a belief on.
Confirmation Bias in YouTube
Upon digesting Google’s white paper on YouTube’s recommendation system, the candidate generation network is a component that can introduce confirmation bias to the videos before they are even ranked.
The candidate generation system is designed to take as inputs the user’s watch vector, search vector, geographic info, age, gender, and a host of other features denoted by ‘…’ in the diagram below. This already filters down the potential recommended videos to a subset of all videos, and this subset will conform to the users preferences and will likely not challenge their views on anything. The whole point of these recommendations is to output videos that the user is likely to click on, to ultimately serve them more advertisements. Working with the incentive to serve more ads puts these recommendation systems in an ethically murky area where they can manipulate confirmation bias in users, encouraging them to engage with videos that agree with their current point of view and are somewhat factual.
Consequences of Confirmation Bias
Confirmation bias can lead to echo chambers and extreme polarization of social groups, as seen below. The Pew Research Center conducts a study of American Polarization every ten years. This study asks ten questions to gauge the political values of a respondent. From 1994 to 2004 the median Democrat and Republican stayed roughly in their same positions on the spectrum. However, in 2014 the median view of a Democrat and Republican shifted further away from each other, indicating a polarized political climate.
Conclusion & Discussion
As software services become more integral to our daily lives, we will have to consider how personalization and confirmation bias built into these systems will affect our day to day life and exposure to information. Optimistically, I hope that the companies behind these services can work to make recommendation systems more holistic while still aligning with their business incentives. Web HCI is an exciting field and I’m excited to see more in depth studies performed in the future to help us learn about the overall human experience with the ever changing web.
- Deep Neural Networks for YouTube Recommendations — Google Research, 2016
- Recommendation Systems Chapter — Stanford InfoLab Text
- What is Confirmation Bias? — Shahram Heshmat Ph.D. blog on confirmation bias
- USC Bias Lecture — USC Marshall School of Business Lecture on Biases
- Reducing Confirmation Bias and Evaluation Bias — Schwing, Buder, 2012