The Writers Strike Back: OpenAI, Microsoft and the (alleged) billion dollar theft.
On Wednesday, December 27, 2023, The New York Times filed suit against Microsoft and Open AI in the federal district court of NY. The complaint, which demands a jury trial, can be read below and alleges seven counts of Copyright and Trademark infringement:
Let me save you 69 pages of narrative and lawyer-speak — The Times is accusing OpenAI of using copyrighted material produced by The Times to train its generative AI models. For a better understanding of the main issue at hand, I’ll provide a bit of explanation of OpenAI and generative AI in general.
You may have seen OpenAI make headlines over the past 2 months for its on again, off again relationship with co-founder and sometimes CEO, Sam Altman.
OpenAI is an American organization founded in 2015 (as a non-profit) with the purpose of developing “safe and beneficial” artificial intelligence systems. By 2019, the company had transitioned into a “for profit” enterprise partnered with Microsoft. Microsoft not only partnered with the company for future endeavors, it invested $1 billion to presumedly support OpenAI’s new goal — to produce commercially available AI technologies.
The most famous of OpenAI’s current AI technologies is ChatGPT. A generative AI system that can not only respond to and answer straight-forward questions like, “Where in the world is Carmen Sandiego”, but also create complete code blocks, recipes, articles, poems, and almost any other written product you can think up. Not only does ChatGPT respond to direct inquiries, it can have a conversation with you. It remembers. *insert Jurassic Park gif here if you’re a Millennial*.
As a frequent user of ChatGPT I must admit that it is truly a step into a sci-fi future for me. As a kid I remember reading about a future world full of computer assistants and virtual worlds. I never really believed I would see anything like it in my lifetime but apparently the joke’s on me. Including OpenAI’s other, less-talked about but just as technologically important, children — Dall-E and Codex (which powers GitHub Copilot) — I truly feel like the future is now. And there are robot brains involved.
I digress, the bigger issue here is how ChatGPT was created. ChatGPT is a type of artificial intelligence that can create new content in response to a user’s inquiry — a generative artificial intelligence. If we think of it as person whose one job is to be our own personal text based virtual assistant then eventually we’ll end up asking, “Jeeves (because all good assistants are named Jeeves), how do you know all of this?” and Jeeves will say, “A large language model taught me. Now, can I provide you with another explanation of why you have to pay taxes or how to make your mom’s cornbread?”
If you’re not Jeeves and you’re reading this article your next question is probably, “what the hell is a large language model?” Glad you asked!
A large language model (LLM) is a specialized generative AI system for text and language. Essentially, a super-massive computer brain was fed as much text, books, articles, plays, poetry, stories, and all the other written material as the engineers and data scientists could find. The computer was then “taught”, by way of lots of math and algorithms, how to understand the structure of language. It learned to understand the patterns of written information and ultimately, how to replicate it.
Now here is where we circle back to where we started — the New York Times lawsuit. The United States Constitution allows for the establishment of a copyright system. In response, the federal government created The Copyright Office as an off-shoot from the Library of Congress. The Copyright Office administers federal copyright law. And one of those laws protects the owner of a copyright from infringement by anyone other than the copyright owner. These rights are codified in Title 17 of the United States Code if you want to double check but I promise, it’s there.
The Times is alleging that OpenAI and Microsoft fed their large language model so much content from The Times that their generative AI systems, specifically ChatGPT and Bing Chat, a search assistant, essentially regurgitate copyrighted NYT material in response to relevant user inquiry. OpenAI and Microsoft have claimed that the use of material from The Times in their LLMs is protected as “fair use”. The doctrine of Fair Use is pretty wordy so I’m going to leave a link directly to the Copyright Office for you to peruse at your leisure rather than try to succinctly summarize its entirety here.
But for the purposes of this lawsuit, in order for a work to be protected as a fair use of another’s work, the work must be transformative in some way. OpenAI and Microsoft assert that their technologies have only been trained by NYT materials or produce it as a search result, the technologies do not participate in activities that infringe upon the copyrights of these works. They claim that their systems do not reproduce or claim credit for works under copyright by The Times. Needless to say but the Times disagrees and asserts that Microsoft and OpenAI are profiting in the billions of dollars by not only stealing from The Times but also from diverting traffic away from The Times.
To get even more specific with their claims, The Times has even alleged that of the billions of individual pieces of writing within the dataset used to train the OpenAI systems, materials from the Times are over-represented and thus more distinctly damaged by the infringement.
The complaint continues to point out the alleged direct and open infringement by including interactions with ChatGPT and Bing Chat that directly quote material written for and produced by The Times. See pages 29–56 which specify infringement of Guy Fieri food reviews, Pulitzer Prize winning articles, and The Wirecutter recommendations. None of which point to or link back to The Times.
Prior to this filing we have seen popular authors including George R.R. Martin, Mona Awad, and Jodi Picoult file similar suits against Open AI. Even comedianne Sarah Silverman got in on the action. Outside of OpenAI, Tom Hanks and Stephen Fry have alleged that their voices have been stolen to promote products or create voice narration without their consent or knowledge.
The recent onslaught of litigation centered around AI technology has the legal community finally agreeing on something — the need for federal regulation of AI and related technologies. There are existing laws, both federal and across most states, that protect aspects of concern within AI like data privacy and security and anti-discrimination laws. But nothing truly encompasses the full field of legal sinkholes opened by the dawn of AI. China and The EU have already begun tackling the issue yet the U.S. government seems unenthused to begin a true effort at creating an AI regulation framework.
I predict that the next 5–10 years will see a legal system forced to confront the future and everything that comes with it. From generative AI to human genetic material protection and beyond. I’m almost positive one of these cases will result in a SCoTUS opinion at some point. As someone straddling the line of tech and law I’m excited to see what’s about to happen.
To continue learning here are some links to get you started: