This rush to get generative algorithms to market is going to cause big problems down the road

Enrique Dans
Enrique Dans
Published in
2 min readDec 22, 2023

--

IMAGE: The LAION (Large-scale Artificial Intelligence Open Network) logo
IMAGE: LAION logo

LAION, the Large-scale Artificial Intelligence Open Network, is the largest and most open archive of tagged images (5,850 million pairs of images and terms in its latest edition, LAION-5B), compiled by scraping images from many, many web pages, and routinely used by many artificial intelligence companies such as Google or Stable Diffusion to train their generative algorithms.

The company that manages it is a non-profit organization based in Germany with a global membership. It is committed to open source in order to make large-scale machine learning models, as well as archives of data and related code, available to the public.

The existence of such repositories is critical to the development of artificial intelligence, and lowers the entry barriers for companies of all types, including open source companies, to train their models. But a study conducted by Stanford researchers has just found that this massive database contains several thousand images of child pornography (CSAM).

If a not-for-profit company with the best intentions can make this kind of mistake as a result of poor supervision, we can only wonder what is going on in other archives being used to train algorithms. However, I am almost as concerned about the problems being…

--

--

Enrique Dans
Enrique Dans

Professor of Innovation at IE Business School and blogger (in English here and in Spanish at enriquedans.com)