What lies behind YouTube Search infinite results
Reminiz was asked by brands to test tags’ accuracy when bidding on YouTube video keywords. Our results showed that using celebrities’ names to target videos, brands miss their target 4 out of 10 times
It happens way more than you’d wish: the answer does not match the request. As YouTube gets better and better at pushing related videos depending on videos you just watched and liked, it might not be as advanced in terms of providing a wide range of videos related to requested celebrities. Of course, it takes more than the first result page to realize it. But YouTube search follows pretty much Google’s syndrome when it comes to provide an accurate result on the long run. Millions of results, way less interesting ones.

Being able to provide hundreds of thousands of videos on a single search is one thing. Making sure that the results match the actual request is another one. Especially when you wish to bid on these videos and advertise on content. So what lies behind YouTube Search infinite results? And more importantly, can we trust these results if assessing thousands of results?
A FICKLE AND UNSTABLE INVENTORY
Over time at Reminiz, we became experts in facial recognition and video understanding. Intuitively, the first videos we started to work on a few years ago were movies, television series, all kinds of “clean” content that would limit uncertainty related to a celebrity’s presence. In addition, long videos would allow gathering a significant number of identical tracks for the same celebrity. When real-time processing became possible, literally recognizing all faces and brands popping out on the screen with no latency, new forms of opportunities emerged, such as monitoring live television.
YouTube videos represent a new paradigm shift: it mixes professional and personal content, very old videos and recent ones, celebrities featured and random people, long and short formats. But the main complexity lies somewhere else: this inventory has no end. A nightmare if you “try to look for someone” in millions of videos. Every day, more than 600K hours are uploaded on YouTube: how can you make sure you have any control on the inventory’s boundaries? The least you can do is to target your search to make sure it will provide with the most accurate videos depending on set criteria. Whether a user or an advertiser, you expect the videos that are pushed to be, as promised, search-relevant. But are they?
A MATTER OF RELEVANCE
The tricky part with relevancy is that it remains, somehow, very subjective. One might think that a celebrity-related relevant video is a video in which the celebrity appears. But let’s take an example to understand how blurry the line is in this case. You are looking for relevant videos related to Donald Trump. You might be interested, of course, in all videos where Trump appears. All of them, really? What about a 3 years old video of him, compared to a very recent video talking about him, without him appearing? And what about an Alec Baldwin impression of him Vs an actual video of him playing golf? At some point, relevancy becomes not so strict and is mostly accurate for the one who set the rules.
We do not intend to solve this relevance matter yet at Reminiz. However, we tend to believe that this lack of clarity when presenting search results favors YouTube in presenting not so relevant propositions. After all, one could always argue that relevance is also based on tags. But tags are based on what people tag. And people tag anything their way. Again, relevancy becomes a real bias when presenting a search result. A simple test showed us how much we had the right intuition on this.
Thanks to Reminiz neural recognition network, we checked the first 1000 thousand videos displayed when searching for “Tsonga[1]” and sorting by relevancy. We were expecting flaws in our results, but they came sooner than expected. On the first 500 videos, Tsonga only appears in 77.2% of them. Passed the first 500 videos, more than 40% of the videos do not feature Tsonga anymore.
When observing these results, we dug into some of these videos to understand why the presence rate had dropped so drastically. It turned out to be impossible to figure a single pattern to explain how and why such videos were pushed. In videos not featuring Tsonga, some would be only tennis related, some others sport related, some others with the tag Tsonga and some others with no link whatsoever to Tsonga himself. On the other hand, some of these “Tsonga unrelated” videos were linked to previous search we had made. At this point, we were lead to think that relevancy was more a matter of “who searches” than “what does he search” for YouTube.

USER OVER CONTENT
In order to get deeper insights on this, we checked a new video batch with another celebrity, Kylian Mbappé[2], that had just won the football world cup. We changed the method and searched for results on a longer period of time: we decided to take the 500 most relevant videos that had been published every 10 days for the last year on Kylian Mbappé. We ended up processing around 10K videos.
On these 10 000 videos, 37% were not featuring Kylian Mbappé at all. From a user’s perspective, it might not be such a big deal. After all, who really gives a rat’s ass about a 6-month-old video appearing on page 8 of the search? And does that really change the trust a user can have in YouTube search engine? However, it matters way more for the advertisers. If bidding on Kylian Mbappé only, assuming that you would want to target all videos where the young football star appears, it would miss its target almost 4 out of 10 times. Talk about efficiency.
On all the tests we ran afterwards, even though results varied depending on the celebrity, one thing remained stable: the search accuracy would always drop sooner than expected and end up pushing videos with no apparent link to the celebrity anymore. This says a lot about YouTube (and the internet in general) conceptual approach: user rules over content. In other words, the content of the video is less important than the user that is watching it.
Taking advantage of Google’s ecosystem, YouTube pushes adequate videos depending on any kind of interaction you might have on its platform or any other Google platform. Previous videos watched, likes, profile information and so on. Nothing new about that. But it might explain why content itself and the ability to “understand” it automatically has been overshadowed so far. Not only have advertisers done the same, relying on always more and more data on users, but they have accepted that this would be the only information they would rely on when advertising on YouTube.
THE END OF THE COOKIE ERA
To put it plainly, advertisers have accepted to bid on criteria set by YouTube, assuming that they would be the best ones to hit the target. Even worse, they have accepted to bid on content they would not even know, both in letting YouTube decide the video inventory for them and using cookies to organize retargeting campaigns. That means in some cases, that advertisers bid on “useless views”: views related to a video that has nothing to do with their advertising strategies.
Again, user data over content.
Although we have nothing against user data, we see three main reasons why this can be a problem. First because you do not have control on the actual content. And content matters. If you want to bid on Kylian Mbappé, you want to make sure that all your ads run when Mbappé in onscreen. If you know for fact that your ad only runs on Mbappé 40% of the time and on unrelated videos 60% of the time, we expect you to be reasonably unsatisfied.
Second because content is context. As viewers become more and more ad-adverse, the impact of display ads, TrueView and other inventive formats, is questioned. Advertisers would then see a real advantage to contextualize their presence. Sports ads if I’m watching sports content. Movie trailers if I’m watching movie scenes. Endorsed commercials if I’m watching a celebrity-featured content.
Last but not least, because GDPR is creating a new order of things. Advertisers won’t be able to rely so much on user data anymore and will need to find new ways to assess what users are doing and how to reach them. Video understanding might be a very interesting lead on this path. As YouTube grows bigger and bigger day after day, making possible to automatically scan content in detail to refine an advertising campaign might become the smartest next move to keep control of your content inventory.
[1] A french Tennis player
[2] A french Football player

