Intuitive Understanding of Youtube Recommendation System

Nishesh Gogia
Analytics Vidhya
Published in
7 min readNov 30, 2021

For any company these days, the Recommendation system has become a vital part, every company wants to give a personalised experience to the user and for that Recommendation, systems are the best choice.

LET’S UNDERSTAND WHAT IS A RECOMMENDATION SYSTEM…

Let’s say you want to buy a t-shirt from Amazon, you went to their website and type black t-shirt,

you will get something like this,

You will see some Black T-shirts on your screen, Simple right???

Now let’s say you liked some t-shirts on the first page and went inside to see them, lets say you select the third t-shirt from the left (BLACK PANTHER ONE), you checked its reviews, ratings, etc.

Now you came back to the first page and select some different black t-shirts let’s say with a round collar or maybe t-shirts with a particular brand etc.

Now if you pay attention, Amazon is collecting every information, every click of yours, whenever you are going to a particular brand or particular pattern, Amazon has started to know your likings, disliking.

It is the same like let’s say you have gone to the nearest market to shop for a t-shirt with a new friend, the new friend did not know anything about your liking or disliking and he is just observing you, he is noticing every action of yours,

What patterns you are choosing?? What brands you are choosing?? What color are you opting for??

Etc etc…

Amazon is that unnecessary friend who is keeping a watch on you every time you are buying something on its website.

Now the question is why amazon is collecting every bit of information about you???

The answer is very simple, Amazon wants you to recommend a product based on what you may like or what you may buy from its website, it’s a very beautiful idea if you think about it.

Let’s say you plan to buy a t-shirt as we saw earlier but while searching for the T-shirt you really liked the black t-shirt and now just entered the page of the black t-shirt to see its price, reviews, etc, you are not intended to buy that T-shirt, you are just watching it.

Now when you entered that black T-shirt page you saw something like this, PRODUCTS RELATED TO THIS ITEM

If you have like the black T-shirt with panther stripes and there are more chances that you may like these t-shirts as well.

What amazon is doing internally, is finding the 10 most similar T-shirts with the T-shirt you are looking at because of the simple assumption that,

“If you like this T-shirt then there are more chances that you will like similar T-shirts”

That’s how Amazon is selling its products to us.

Just from the Recommendation system, Amazon got 40 billion dollars business, and it is a big big number, and that is the reason Amazon is enhancing its Recommendation system day by day with new technologies.

Let’s build a simple Youtube Recommendation System…

Before starting, let me make clear that the actual Youtube Recommendation system is much much complex than what we gonna discuss here but my intention is to give you a flavoUr of how the Recommendation system works internally.

Let’s get started…

Now the question is what YouTube is recommending to us and why there is a need for a recommendation system in YouTube?

Now YouTube earns money by showing us ads in between the videos so the more the user will stay on YouTube, the more company earns money, so basically, our time is their money.

So they want to see what we like to see, that is why they use the recommendation system.

Now as we saw on Amazon, the important thing is our data, so how YouTube is collecting our data??

Our YouTube history, our location, our name, our mail id, even our google search history, all of this is owned by YouTube, that is a lot of data.

Now for simplicity, let’s say YouTube only has our YouTube watch history and it knows what we see on YouTube.

Let’s see a solid example so that you can imagine things better,

We have a Data-matrix, user1, user2, user3, user4, user5 belongs to users of YouTube and vid1, vid2, vid3, vid-4, vid-5 belongs to the videos available on YouTube, we are assuming here that only 5 users are there on YouTube and there are just 5 videos on YouTube, Now wherever in the matrix there is 1, it means the user has seen that video, and whenever there is 0, it means the user has not seen the video. for example-

USER-1 HAS SEEN VIDEO -1, VIDEO-2, VIDEO-4, AND VIDEO-5 BUT DID NOT SEE VIDEO-3

SIMILARLY, USER-5 HAS SEEN VIDEO-1 AND VIDEO-4 AND DID NOT SEE VIDEO-2, VIDEO-3, AND VIDEO-5.

Now let’s understand the problem statement,

  • Let’s say we have 5 videos, v1(cricket), v2(cooking), v3(workout), v4(cricket), and v5(cricket).
  • Let’s say we have a user-1 named ‘Rohan’, he watched v1(cricket), v2(cooking), v5(cricket) out of 5 videos from last week, now the task is to recommend ‘Rohan’ some videos which are similar to v1, v2, and v5.
  • So out of v3(workout), v4(cricket), the Recommendation system should be able to pick v4 because it is most similar to v1 and v5.
  • So if we can find ‘Similarity’ between videos based on the data we have, our problem will be solved.
  • Concept is very simple, “if Rohan has seen a cricket video multiple times in history, it is more likely that he will see a cricket video in the future”.
  • So if v4 is a cricket video then we can say that v1, v5, and v4 should be similar and most of the users must have watched them together, taking this statement we will find the similarity.

Now if there is a way by which we can find the similarity between the videos then our work will be done, but we need to make sure that that similarity should come only from the data matrix we have.

So If I just use a simple intersection concept which we have studied in class-10th, refer to the image below to understand union and intersection.

Now from the above image we can say, the similarity between vi(video i) and Vj(video j) is defined as the intersection of users who have watched both the videos and then we count the number of users who have watched both the videos.

This is a very simple way to understand the similarity function, there could be much better ways to define it but in this article, we are simplifying things to understand Recommendation systems better.

Now, If we want to find similarities between video-1 and video-5(refer to the example), then we can say that 3 users out of 5 have seen both the videos(u1+u2+u4), as both of them belong to cricket.

similarity(v1,v5)={u1,u2,u4},size is 3,

similarity score is 3 similarity(v1,v4)={u2,u3,u5}, size is 3, similarity score is 3

similarity(v1,v2)={u1},

similarity score is 1 similarity(v1,v3)={u4}, size is 1, similarity score is 1

similarity(v5,v4)={u1,u2}, size is 2

similarity(v5,v3)={u4}, size is 1

similarity(v2,v4)={NULL }, size is 0(No user has watched v2 and v4 together)

similarity(v2,v3)={NULL }, size is 0(No user has watched v2 and v3 together)

By this we can say that, similarity between v1 and v5 is 3, which is the highest number and also from the problem statement we knew that v1 and v5 both belongs to cricket so their similarity score must be high.

So if Rohan has not seen v3(workout)and v4(cricket) and the Recommendation system has to decide that out of these 2 videos which video has to be recommended to Rohan, How it will decide??

It will see the similarity scores between the videos Rohan has watched, for example,

Rohan has watched v1, so it will check the similarity score between v1 and v4 which will be 3, then it will check the similarity between v1 and v3 which will be 1

Now Rohan also has watched v5, so the similarity between v5 and v4 will be 2, then it will check the similarity between v5 and v3 which will be 1

Now at last Rohan has watched v2 also, so the similarity between v2 and v4 will be 0, then It will check the similarity between v2 and v3 which will be 0.

So it is very clear that out of v3 and v4, the Recommendation system will choose v4 for Rohan as its similarity scores with the videos Rohan has watched is higher than v3.

I hope you guys got the flavour of Recommendation system. I want to clarify one thing again and this is a very basic Recommendation system, actual systems are much complex.

Thank you for reading…

Nishesh Gogia

--

--