Relevant is open for beta!
Today marks the initial beta launch for Relevant.watch (or Relevant for short)! After many years of work, I am finally proud to share a new place for YouTube fanatics to explore the depths of YouTube without relying on the algorithm. Relevant is for those who hope to find a new hobby to hyperfixate on or even just for the more casual content consumers to find more of what you already watch. We don’t rely on Google’s “algorithm” for recommendations — instead, our categorisation of YouTube comes from the crowdsourced efforts of the Relevant community members.
The inspiration
YouTube has been my main source of entertainment for the past 5 years. It’s my one and only subscription and I’m more than happy to pay to support all of the incredible content that YouTubers work so hard to create. However, as much as I enjoy watching my usual cast of YouTube characters, I sometimes wish to branch out from the niche interests that I’ve built into my subscriptions over the years. When browsing through the YouTube explore pages, it feels like Google is actively pushing me to sponsored channels that may be irrelevant to my interests or just recycling the same types of content that I have already watched.
Relevant was created out of my own desire to branch out into new corners of YouTube without relying on YouTube’s opaque algorithm to organically surface them to me
The one big road block with creating such a recommendation system is that there is no large data set that contains classification of all YouTube content. I’ll go into more detail about what options exist today, but needless to say that none of them are specific enough to really do justice to how complex and varied the YouTube landscape has become. Because of the sheer amount of content on YouTube, it simply isn’t possible for one person (or even a large group of people) to manually categorise everything. That is why Relevant is based around a crowdsourced data model — it requires the true experts of YouTube, the viewers, to share their domain knowledge.
Where are we going to get data?
The first question that came to my mind when I embarked on building a YouTube catalogue was “Surely someone has already done this? At least Google would already have this information? Can we not just re-use that?”
From cursory research, it appears that YouTube already does some amount of classification. For example, when content creators upload a new video, YouTube requires they categorise it under one of a set of “video categories”. This set of video categories is strange (to say the least) — it contains some normal categories like “Sports”, “Gaming”, and “Comedy” but then also has categories for “Comedy” (yes, twice), “Family” and “How To and Style”. I couldn’t find any documentation as to why “Comedy” appears twice, or why “How To” and “Style” and inherently linked. Regardless of the weird grouping, these categories are too high level to narrow down to any reasonable selection of channels with similar content.
YouTube also used to associate channels with certain “topics”. We can see evidence of these topics in the responses of the YouTube Data documentation, where you can see a `topicDetails` in the response when you search for a channel. As per those same docs, these topics have long been deprecated (as of November 2017), and those fields are always empty now.
Google probably still has channel categories embedded in their backend systems that they use for their own recommendation algorithm but they certainly don’t surface anything anymore. This leaves us with no choice but to gather the data on our own. Although I have racked up a lot of hours watching videos, I don’t think any one person (or even a sizeable group of people) could even make a dent in cataloguing every channel on YouTube. However, if enough people contribute their understanding of the YouTube landscape that they are familiar with, we can eventually build a big enough data set to start.
How we plan on categorising YouTube
Before we start putting channels into categories, we need to come up with a list of categories that they could possibly fit into.
It’s tricky to come up with a big long list of categories since some YouTubers are embedded into deep niches of YouTube content while other creators attempt to appeal to a broader audience and choose to make content that covers a wider range of topics. Our categorisation system needs to be able to adapt to any breadth of a channel’s content.
In our database, categories will be structured as a “tree” — a data structure within computer science that represents information as a set of nodes connected with edges [1]. Trees are used by Amazon to solve a very similar problem of representing product categories — for example, under the “Electronics” section of the Amazon menu (a node in the Amazon product tree), there are links (edges) to “Laptops”, “TVs”, and “Tablets” and each of those have their edges to more products (nodes) converging on a more specific product categories.
For Relevant, we’ll have a set of very high level nodes at the top, like “Science”, “Technology”, “Gaming”, “Autos and Vehicles” and each of those will recursively list further subcategories getting more and more specific until we end up with categories that define only a handful of channels dedicated to that niche (in theory).
Since new channels with new categories of content are invented literally every single day, this categorisation system will have to evolve over time. After enough channels are creating new types of content that don’t fit into any existing category, a new category will be added to the tree to represent these channels.
In the same way that I can’t categorise all of YouTube by myself, I also can’t keep track of what new content is appearing each week. To stay on top of the latest changes in content, some members of the community — known as category managers — will have the power to adapt sections of the category tree (to add new categories).
Limitations of the beta
Although we hope that one day Relevant is a comprehensive catalogue of YouTube, with the beta launch today we’re going to start a lot smaller and hopefully over time we work our way up to that big end state. The scope of the website will initially be limited so that it is manageable by the very small team of content moderators and so that we can test our processes before scaling up.
Initially, the categorisation tree will be scoped only to a handful of broad categories — “Science”, “Technology”, “Autos and Vehicles”, and “Gaming” — and a collection of subcategories under each of these. Even just for these categories, there is an ample amount of channels that should teach us valuable information on how to properly manage subcategories and whether the crowdsourcing system will scale.
Another concession that needs to be made during the beta is the recommendation engine. For people who aren’t interested in diving into the full category tree, we will be showing recommendations based on their current subscriptions. In order to be able to recommend content to people, our systems need to have data that knows “people that watch channels in category A also enjoy watching channels in category B”. So until we have data about what people watch, and what category those channels fall under, we can only produce rudimentary recommendations.
In time both of these limitations will be lifted and I aim to support the classification of all types of YouTube content and recommendations that fits everyone’s interests. For now it will just have to be one step at a time to make sure we get things right.
Let’s get categorising!
I hope others can connect with my passion for YouTube and also want to know what else it has to offer — without taking Google’s word for what’s worth watching. I look forward to learning from all of you!
I hope to publish more blog posts with updates about Relevant, technical deep dives into the system behind the website and plans for future features — so keep an eye out for those. In the meantime, if you have any questions or want to learn more about Relevant, I urge you to join the community Discord server.
[1] In particular, we are using a type of tree known as a DAG — a Direct Acyclic Graph. In a DAG no edges can connect back up to any higher node in the graph so as to form a cycle. In the case of Relevant, this means a more niche category can’t contain any more general categories.