On topic annotation: how to extract relevant labels from videos?

Axel de Romblay
Feb 4 · 8 min read
Dailymotion topic sections

“But why do we care at dailymotion about being able to accurately categorize content at scale?”

  • Watching interface: push videos with trending and popular topics, recommend videos related to a topic.
  • Search engine: retrieve videos from a given topic.
  • SEO and acquisition: increase the external visibility of our video catalog and get more new visitors.

“What technical challenges do we face for a relevant topic annotation algorithm?”

  • Relevance & quality of the topics: we want to have relevant and specific/precise topics. E.g : “2018 FIFA World Cup” vs “Football”.
  • High precision/coverage tradeoff: we want to tag (or cover) the maximum videos with at least one topic and with a minimum error rate. E.g: we can have a 100% coverage with (almost) random topics vs 50% coverage with very accurate topics.
  • Fast and up-to-date annotation: we need a fast annotation pipeline that proposes updated topics. E.g: “Juventus” for a video related to Cristiano Ronaldo vs “Real Madrid”.
  • Multi-lingual annotation: we need to tag videos for all the languages. E.g: French, English, Korean videos…

Demystifying our topic annotation pipeline

In this section, we will present the different steps of the actual topic annotation pipeline running at dailymotion.

Let’s get a bit technical!
  • Additional data: the channel, the country/language, date, …
Schema of the actual pipeline running at dailymotion. Each step is presented below.

1. Text extractor and language detector

This step takes all the metadata related to the video as input and outputs the description and the associated language.

2. Topic maker on text

This step takes the video description and the corresponding detected language as inputs and outputs candidate topics related to the description.

Example of a Wikidata entity defined by its id: “Q15869”. More details here.

Preprocessing phase

As usual, we first need to clean the descriptions and tokenize the words.

Disambiguation phase

For a given word, we would like to pick the appropriate sense in case of polysemy.

  • The relatedness score between the word and the sense. The idea is to pick the sense that best fits the context of the text: we use a voting scheme where all other words vote for the sense. The vote of a word to a sense is based on the overlap between their in-linking pages in Wikipedia. Back to the example, if our description is “Tata Motors acquisition of Jaguar”, the word jaguar is related to the car…

Pruning phase

Once we have mapped each word to a unique sense (or Wikidata entity), we want to select only the meaningful entities to generate candidate topics.

“Tata Motors” is very likely to be a link in Wikipedia.

3. Topic Filter on text

This step takes a set of candidate topics as input and outputs/selects the accurate ones.

  • The description is ambiguous or not representative of the video (this is the case if accurate topics come from other metadata like the frames or the audio).
Our model will mimic human decision.

Ranking of the candidate topics

  • Feature engineering
Schema of the machine learning model

Selection of the candidate topics

The selection allows us to push accurate topics on the dailymotion interface.

What’s next?

  • Working on the topics to build an automatic categorization of our topics (example: “2018 FIFA World Cup” < “Association Football” < “Sports”).
  • Working on both descriptions and topics/categories to get a “contextual” categorization of our videos.


The home for videos that matter

Thanks to Dailymotion Engineering, Rachel Wignall, and Colas Courjal.

Axel de Romblay

Written by

Data Science & Machine Learning @Dailymotion https://www.linkedin.com/in/axel-de-romblay-6444a990/


The home for videos that matter