Knowledge Graphs and Youtube

Berkay Ataman
3 min readOct 17, 2018

--

For video recommendation Youtube uses collaborative filtering which is simply a matrix consisting of every user and every video as columns and rows. Cells of that matrix is either null if user didn’t watched it yet or value of users like/dislike action. By that, Youtube checks my video history and from matrix, finds users also liked same videos as me and recommends to me any video these users also liked but i didn’t watch yet.

However there are some issues about filtering-matrix. Due to it is a huge matrix, it is very sparse and full of empty cells. A user might watched very tiny portion of these videos. Also one more issue is cold-start problem which is if user or uploaded video is new there are no enough signals to recommend something.

To overcome this problem, Youtube starts suggesting videos with similar topic.

How to understand and classify content of every Youtube video?

Youtube uses a Deep Learning based topic determiner application which checks images and sound of the video and tries to figure out topic of it. Even though it is fairly good at classifying topic of a video, it is not perfect and it is very expensive by sources.

As most of the information can found on title and description of the video, Youtube also classifies videos topic from there. To find out the topic Youtube uses Knowledge Graphs as vocabulary due to there millions of videos with different topics we need a huge source and we can recommend related topics as well by gathering the meaning.

What is Knowledge Graph?

Main motto of a knowledge graph is “It is all about things, not strings”. It is basically a huge graph that connects things by their relations. It is a structured data which enables machines to read it as well.

Wikidata (Open source project of Wikipedia) is a great source for a Knowledge Graph by the way.

Knowledge graphs are connecting all entities together and it helps us to get the context but also it helps us query in advanced level with the help of SPARQL. Such as asking to Wikidata “location of Bob Dylan’s birth place” is just a one query.

Back to the problem..

From where we left, description and title hold many topics. For every topic there will be an entity in knowledge graph and we can also access their related topics as well. Problem here is distinguishing central topic from others which is also done by the help of knowledge graph.

For example, video of skydiver in Dubai below. There are many concepts in description however we need to find the central concept and as we can understand Skydiving is the central but not Dubai.

Here Youtube checks the co-occurence of these entities from Wiki and entities vote each other. Wingsuit flying with Dubai will not co-occur as much as with Skydiving. Most occured entity is obviously the topic of the video and by that Youtube achieves the central topic of a video.

References

  1. https://www.youtube.com/watch?v=D-bTGefJj0A
  2. https://semantic-web.com/2018/08/23/knowledge-graphs-connecting-dots-increasingly-complex-world/

--

--