How is the Media Sector Responding to Content Crawling for Model Training

Tech companies increasingly rely on access to high-quality media content to develop AI systems. How are audiovisual media content creators and holders responding to this? Are they forging licensing deals, issuing anti-scraping statements or building solutions to protect their content from unwanted crawls?

Rasa Bocyte

Published in

AI Media Observatory

2 min readJun 18, 2024

Author: Rasa Bocyte, Netherlands Institute for Sound & Vision and Lidia Dutkiewicz, CiTiP KU Leuven

Yasmine Boudiaf & LOTI / Better Images of AI / Data Processing / CC-BY 4.0

Monitoring media responses

AI4Media established a repository to observe how media organisations are responding to content scraping activities that fuel AI models. You can access it here.

The repository aims to guide media creators and holders as they formulate their own responses and seek solutions to these developments. It will also provide researchers and policy-makers with insights into policy gaps and challenges that media makers face against tech companies.

In the repository you will find content related to the following categories:

Anti-scraping statements — positions expressed by media creators and holders against automated crawling of their content;
Legal action — lawsuits launched against tech companies for unlawful data scraping;
Licensing deals — agreements between media holders and tech companies to use media content for model training;
New techniques & methods — tools that are being developed to prevent or detect content scraping;
Opinion pieces and analysis — general analysis of the topic.

Contribute to the repository

The repository is intended as a living and growing collection of resources that will evolve over time. The first batch of resources was gathered by AI4Media partners but we rely on crowdsourced contributions to expand the collection of resources. We welcome content contributions covering global developments in any language.

As the repository grows, we hope to observe patterns across different types of responses that can help other players in the media sector formulate their own reactions.

Have you authored or come across content related to this topic? You can submit it to our repository using this link.

If you have any suggestions for improving the repository, please contact Rasa Bocyte rbocyte@beeldengeluid.nl or Lidia Dutkiewicz lidia.dutkiewicz@kuleuven.be.

How is the Media Sector Responding to Content Crawling for Model Training

Monitoring media responses

Contribute to the repository

Written by Rasa Bocyte