The search is on for reliable data to train generative algorithms

Enrique Dans
Enrique Dans
Published in
2 min readJul 14, 2023

--

IMAGE: A robot seating at a table and reading documents
IMAGE: Thank You Fantasy Pictures — Pixabay

I have the impression we’re going to be seeing more news like this: the Associated Press (AP) has signed a partnership deal with Open AI, the parent company of ChatGPT, that will give the US news agency access toOpenAI’s technology and products, while OpenAI’s algorithms will be trained with the vast archive of news texts generated by the agency since 1985.

In 2014, AP announced a ground-breaking partnership with Automated Insights to use algorithms to generate news such as summaries of financial results or sports reports, which allowed it to significantly extend its coverage. Meanwhile, Bloomberg uses this type of technology in approximately one third of its news output. AP already provides daily news in English, German, Dutch, French and Spanish, using sophisticated translation technologies capable of maintaining a certain style, and that has put many translators out of work.

For OpenAI, using AP news stories to train its generative algorithms provides it with guarantees of veracity, an important issue, if we see more successful lawsuits issued against companies that create generative algorithms by artists, writers, comedians or image banks like Getty Images. At a time when it is becoming harder to obtain quality data to train algorithms, agreements like this will be key, given that constantly reusing the…

--

--

Enrique Dans
Enrique Dans

Professor of Innovation at IE Business School and blogger (in English here and in Spanish at enriquedans.com)