It’s in everyone’s interest for algorithms to be trained using the best source material

Enrique Dans
Enrique Dans
Published in
3 min readAug 21, 2023

--

IMAGE: A screen with code and with a text on top, in black and white
IMAGE: Gerd Altmann — Pixabay

The growing use of generative algorithms is prompting lawsuits from companies and writers whose work has been used to train them, and are demanding their work not be used for such purposes, at least without financial compensation.

Journalists have discovered that hundreds of thousands of books and countless web pages and news sites have been used to train generative algorithms on the basis that it is not illegal to scrape data that is in open format and available to the general public online. These types of techniques have allowed companies developing generative algorithms to get their hands on huge amounts of data that is now routinely used for training.

The question is whether writers should be compensated for their work being used in this way: if someone is inspired by a book and writes something which makes them a millionaire, should the author of the book should be compensated? If a painter is inspired by the works of others, are they entitled to part of their earnings?

But beyond that… what do we really intend? When The New York Times, one of the most prestigious media outlets in the world, decides to prevent its copy being used to train generative algorithms… what are they trying to do? Negotiating for compensation? And if nobody is…

--

--

Enrique Dans
Enrique Dans

Professor of Innovation at IE Business School and blogger (in English here and in Spanish at enriquedans.com)