Targeting keywords on YouTube: 90% of your budget might go to trash

Paul Chaumont
Nov 5, 2018 · 5 min read

In October, Reminiz put its video understanding AI to good use and processed over 50.000 YouTube videos to validate the effective presence of 25 celebrities in it. Based on YouTube search API, our algorithm processed more than 2000 videos per celebrity and assessed YouTube accuracy when it came to push thousands of videos allegedly answering our requests. Our results are confusing, to say the least.

Image for post
Image for post

The little exercise we’ve tried to run is actually very simple: use YouTube search on targeted celebrities’ names and determine if the result matches the request. Remember last summer, we had already started our experiment on the tennis player Jo-Wilfried Tsonga and the first 1000 thousand videos that came out when looking for his name.

Challenged by these first results, we have decided to extend the range of search and let our algorithm process 50 000 videos on 25 celebrities. Not randomly though. We have limited the search on the past year to make sure we would get recent results and also limited to the most relevant videos per celebrities: the videos that YouTube would push first with the “relevancy” criteria. It would not make sense to make this assessment on 6 years old videos that appear way down the search list.

With a few exceptions, results show a real pattern in YouTube’s disability to provide celebrities’ videos matching the request of origin.


To make things clear and make sure our results would not be altered by unknown celebrities, we chose very famous ones, mainly actors, from Jennifer Lawrence to Brad Pitt, Peter Dinklage to Milly Bobby Brown, Kate Winslet to Chris Pratt. Apart from ensuring these are famous enough, we wanted to make sure each and every one would make perfect “commercial material”. To put it in other words: does YouTube give us access to the most accurate inventory if we want to bid on a celebrity who has a strong visibility ?

On the 50K videos processed, around 2K per name, the celebrity actually appears in the video only 31,09% of the time. Meaning 2/3 of YouYube inventory is wrong and out of context compared to the initial celebrity requested.

Depending on the celebrity, it gets worst. It even goes below 20% for Kate Winslet (19,39%), Terry Crews (15,84%) or Peter Dinklage (13,82%). On the contrary, some (few) have succeeded to raise above the average: Brad Pitt (59,91% of appearance), Tom Cruise (62,45%) and Dwayne Johnson (63,28%) lead the race and manage to get more matching results than incorrect videos.

However, even if those are the best results, they are still pretty low if you think that when budgeting a $ 100K , 40K goes straight to trash. Imagine : the best-case scenario here shows that 36,72% of your inventory on THE ROCK will not display the action movie star. A bit annoying if you try to pre-empt all advertising spaces featuring Dwayne Johnson on YouTube. Really bothering if you decide to bet on Peter Dinklage.

Image for post
Image for post


But one should know that video content and celebrity’s presence is nothing if not put in relation to the number of views. And that is when things drop dramatically. If 69% of the processed videos do not feature the requested celebrity, 92,71% of the total views do not concern your initial search. Getting back to our previous example, it means that on $100K invested, you lose 90K to trash.

In other terms, valid videos have less views than invalid ones: 31,09% of the inventory (the good part) gathers 7,29% of all the views. Again, this is an average and, in this case, some of them have truly low results. Sarah Paulson is a good example of it. Out of the 2000 videos processed on Sarah Paulson, only 22% of them feature her. But the number of views from valid videos represents 0,94% of all views. Out of the 13 billion views from her 2000 videos, only 120 million really match Sarah Paulson. Naomie Harris and Penelope Cruz join the “less than 3% actual views” club with her.

Only a few can brag on correct results. Leonardo Di Caprio manages to gather 32,91% of valid views out of the 2000 allegedly most relevant videos featuring him this past year.

Apart from very random and unreliable results, what can we draw from this experiment?


The first lesson we can draw out of this is probably that human video processing and related tagging do not work. We should all know by now that tagging a celebrity’s name on a video is not enough to validate its presence. More than that, invalid videos make more views than valid ones. When YouTube is not able to find enough videos featuring the requested celebrity, it starts looking for popular videos, whatever the relevancy with the topic. Even if there are not related to the initial topic at all.

Let’s take a precise example to understand YouTube’s search strategy.

If I’m looking for Peter Dinklage’s videos, only 13,82% will actually feature him in YouTube’s results. Views from these valid videos will only represent 1,60% of all views. When YouTube algorithm will start doubting on Peter Dinklage presence, it will then compensate with related popular videos, such as Game of Thrones trailers for instance. The more you’ll get down the list, the less relevant videos will be. After 1500, videos will neither feature Peter Dinklage nor Game of thrones. It might as well be Lady Gaga’s latest movie clip, just because it will ensure some views.

This explains very well why the sum of views from invalid videos is way higher than the ones featuring your target: YouTube compensates the absence of the celebrity with views. It balances a very poorly qualified inventory with the capacity to massively strike on hugely popular videos.


In the end, whether you’re just looking to watch a video or trying to advertise, you might end up with an incorrect result. Even if Youtube has been criticized a lot for its opacity we do not believe this is done on purpose in order to valorize all videos from its inventory rather than only a few groups of videos.

The main limitation that comes across Youtube advertising products or any company that claims to contextualize ads based on tags, is that tags themselves are totally erratic. They are not verified, not homogeneous and the importance we attach to each tag from a video to another varies a lot.

Indeed, as long as videos are tagged by individuals with different sentimental perspective and therefore, different interpretation of things they see, tags cannot be reliable. Tagging videos at scale with homogeneous data while remaining impartial is only possible with “video understanding”. Knowing that video will represent 80% of internet global traffic in 2020, this mission has become Reminz’s leitmotiv.

One thing is certain though: as we start digging seriously into content and how it matches the target, we’ll realize that the industry is far from delivering accurate campaigns, often dazzled by views or impressions as the main standards, even though they do not match our initial request.

Context Insights

Knowledge Builds Trust

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store