These books are being used to train AI. No one told the authors

6 min readOct 9, 2023

In a shocking revelation, These books are being used to train AI. No one told the authors.

Referred to as the Books3 dataset, this collection of pirated e-books has left authors devastated after discovering that their work was being exploited.

This revelation has sparked lawsuits against Meta and other companies, with outraged authors arguing that their books were stolen and their creative efforts undermined.

The Writers Guild of America has even initiated a strike to demand limits on the usage of AI in writing films and TV shows.

As the conversation around Books3 intensifies, authors and artists are joining forces, calling for unity and action to combat this abuse of their work by AI systems, highlighting the urgent need for responsible AI innovation.

Overview

This comprehensive article will delve into the controversial use of authors’ books by AI systems, exploring the number of books used for AI training, the origin of the Books3 data set, the lack of prior notification to authors, the legal consequences faced by Meta and other companies, the emotional response of authors, the tracking of book usage by AI systems, allegations of theft and exploitation, the broader implications for AI and art, the Writers Guild of America’s strike for writing limits on AI, the relevance to President Joe Biden’s AI executive order, and the call for action and solidarity among authors and artists.

Introduction

The revelation of AI systems using authors’ books without their knowledge has sent shockwaves through the literary world.

Many authors were completely unaware that their works were being utilized to train AI systems, resulting in a mixture of surprise and disappointment.

The scale of the issue is only now coming to light, leaving many authors grappling with the implications of this unanticipated use of their intellectual property.

Number of Books Used for AI Training

Tech companies have employed an astounding number of books for AI system training, with an estimated 200,000 books being utilized.

This vast collection of books serves to enhance the capabilities of AI systems, as they learn from a diverse range of literary works spanning numerous genres.

The inclusion of such a large and varied dataset offers AI systems a deeper understanding of language and storytelling.

Origin of the Books3 Data Set

The Books3 data set, at the heart of the controversy, is compiled from a collection of pirated e-books.

This collection encompasses books from various genres, offering a wide range of literary material for AI system training.

While the origins of this data set provide a rich source for AI advancements, its acquisition through illicit means raises serious concerns regarding intellectual property rights and ethical considerations.

Lack of Prior Notification to Authors

One of the most unsettling aspects of this controversy is the lack of communication and prior notification given to authors.

Tech companies failed to inform authors that their books were being used to train AI systems, leaving them caught completely unaware.

This disregard for authors’ rights and failure to seek permission is a violation of their intellectual property rights, making the situation all the more problematic.

Legal Consequences Faced by Meta and Other Companies

The unauthorized usage of authors’ books by AI systems has resulted in lawsuits against Meta and other related companies.

Authors whose works were included in the Books3 data set have taken legal action, seeking compensation for copyright infringement.

The legal battle surrounding this issue highlights the gravity of the violations committed by these tech companies, placing their ethical and legal practices under scrutiny.

Authors’ Emotional Response to Discovering Use of Their Books

Upon discovering that their books were being used without permission, authors expressed a range of intense emotions.

Outrage and disappointment were prevalent among these writers, who felt a sense of betrayal and violation of trust.

Many authors went as far as to label the use of their books in AI training as a form of theft, fueling their already strong sense of being exploited.

Tracking the Use of Books by AI Systems

The Books3 database serves as a tool for authors to track the utilization of their books by AI systems.

Authors can now gain visibility into how their works are being incorporated into AI training, offering a means of accountability and transparency.

This newfound knowledge empowers authors to protect their intellectual property rights and advocate for responsible AI practices.

Authors’ Allegations of Theft and Exploitation

The use of authors’ books by AI systems has sparked allegations of theft and exploitation.

Authors argue that their works are being taken and utilized without proper consent or compensation.

Their perception of exploitation is rooted in the realization that their creative works are being harnessed for AI advancements, potentially generating significant value for tech companies without due recognition or remuneration for the authors themselves.

Broader Implications for AI and Art

The controversy surrounding the Books3 data set raises significant concerns about the increasing reach of AI into all forms of art.

As AI systems gain the ability to mimic and create artistic works, questions arise regarding the originality, creativity, and ownership of these AI-generated pieces.

The need for responsible AI innovation becomes paramount in safeguarding the integrity and rights of artistic expression.

Writers Guild of America’s Strike for Writing Limits on AI

In response to the use of AI in creative processes, such as writing films and TV shows, the Writers Guild of America has initiated a strike demanding limits on AI utilization.

This strike aims to draw attention to the potential erosion of human creativity and the need to maintain human control over artistic endeavors.

It highlights the urgent need for regulations and guidelines to preserve the essential role of human writers in shaping the artistic landscape.

Relevance of Books3 Discussion to President Joe Biden’s AI Executive Order

The discussion surrounding the Books3 data set aligns with President Joe Biden’s plans to introduce an executive order addressing AI.

This executive order seeks to establish principles for the responsible development and deployment of AI systems.

The controversy surrounding the unauthorized use of authors’ books further emphasizes the importance of ensuring ethical AI practices and safeguarding intellectual property rights in the realm of AI technology.

Call for Action and Solidarity among Authors and Artists

Authors and artists across various creative industries are joining forces, recognizing the need for action and solidarity to combat the abuse of their work by AI systems.

They are advocating for stricter regulations, increased transparency, and fair compensation for the use of their intellectual property.

By building coalitions and collectively raising their voices, authors and artists aim to protect their valuable contributions to the arts and secure their rightful place in the rapidly evolving landscape of AI technology.

Conclusion

The controversial use of authors’ books by AI systems has brought to the forefront important discussions about intellectual property rights, ethical considerations, and responsible AI innovation.

The scale of the issue, lack of prior notification, emotional responses of authors, and legal consequences faced by tech companies underscore the urgency with which these matters must be addressed.

As authors and artists call for action and solidarity, it is crucial to recognize the broader implications for AI and art, ensuring the protection of creative works and the preservation of human creativity.

Find out more about AI Newest Tech Hub

Originally published at https://ainewesttechhub.com on October 9, 2023.