AI, Copyright & Fair Use: Avoiding the Artificial in Intelligence & Maintaining our Humanity

8 min readFeb 6, 2020

AI, Copyright & Fair Use: Avoiding the Artificial in Intelligence & Maintaining our Humanity

by Neil Turkewitz

The US government, along with governments around the world, is examining the relationship between machine learning for the purpose of training AI and issues related to both the ingestion, and production, of original materials protected, or theoretically protectable, by copyright. Non-lawyers seem drawn principally to the question of who owns the output of AI trained through the utilization of copyright protected materials, and this is indeed an interesting and important question…but copyright lawyers, including me, are perhaps more focused on the predicate act — the use of copyright protected works to “train machines” to enable them to produce original works in the first place.

So while the Patent and Trademark Office and Copyright Office have posed questions to the public on the range of copyright issues implicated by AI, I have been mostly thinking about the use of preexisting materials and how to maintain and advance our Constitutional imperative to sustain creativity and secure the progress of our nation in a way that celebrates our individuality and our diversity. In particular, I think we need to resist any decontextualization of the consideration of these issues, and to ensure that our thinking about AI and copyright reflects broader issues in society related to the effects of automation and the use of algorithms which increasingly define the parameters of our existence, and which, as expressed brilliantly by scholars like Ruha Benjamin (Race after Technology) and Safiya Noble (Algorithms of Oppression) can reify and reinforce existing inequalities in power and produce even greater injustice.

It is within this broader frame of thinking that I readily acknowledge my skepticism about AI and the values which may, or may not, be advanced in its pursuit. That is not to say that I believe that AI is, on its own, unimportant or without merit. There are clearly many situations in which AI can help to provide answers based on patterns that would otherwise remain forever invisible to the human eye. But approaching AI with a healthy dose of skepticism and humility does — I think usefully, underscore the importance of understanding the implications of its development and deployment. Its implications on those providing the training sets for AI, as well as for the subsequent use of “intelligence” garnered from the use of such training sets. We must keep foremost in our minds that we are making choices based on values, and frequently competing values. Efficiency on its own, without regard to the nature of the future to which it is propelling us, is not a value that we should cherish. And where efficiency — e.g. the ingestion of pre-existing materials without securing the permission of the copyright holder or subject of a personal image, would erode consent, we need to think very carefully about the trade offs and the nature of the object of our desire.

As I wrote earlier, “Let’s assume — just for the sake of argument, that securing consent would complicate the building of databases and therefore slow innovation. Here’s where we confront the issue of values. Do we want to build a society which champions lack of consent as a virtue? Where consent must be foregone to achieve progress? Where personal (and state sovereignty) are decried as a rebuke of modernity? I for one don’t want that world. Let’s eschew a race towards dehumanization and the erosion of free will. We don’t need to chase China down the rabbit hole of technology that offends our values. (Side note: this is not intended to suggest that China is in fact embracing such developments — only to respond to those that suggest we need to avoid complexities like consent in order to compete with China.) Let’s ensure that our technologies reflect our values, or we risk building a world that we don’t care to inhabit.”

It is with these thoughts in mind that I settled down to read the recent submission of the Software Alliance (BSA) to the Patent and Trademark Office. While I tend to see the world in a somewhat different way than they do, I respect their views which I generally find to be nuanced, well-considered and expressed, even when I disagree. They have submitted various briefs on the issue of fair use, including in the Google v. Oracle case headed for the Supreme Court, setting forth a measured position on the issue of fair use, simultaneously embracing its importance, while warning against overbreadth. Their 2017 amicus brief before the Court of Appeals for the Federal Circuit captured it perfectly:

“The Court should also ensure that courts applying fair use defenses to infringement in software cases do so correctly. Fair use may be important in various circumstances, but it should not be interpreted so broadly as to swallow the commercial value of an infringed underlying work by failing to fully and carefully weigh all four of the factors set out in 17 U.S.C. § 107. .. Courts recognize uniformly that evaluating the fair use defense is case-by-case and fact-driven. It is not to be simplified with bright-line rules. Harper & Row Publ’rs v. Nation Enters., 471 U.S. 539, 560 (1985).“

It was thus with great consternation that I confronted the lack of restraint manifested in their recent submission on AI in which they essentially argue that the national imperatives in the race for AI dominance justify an expansive view of fair use that would only limit use of preexisting materials where the output was perceptively infringing. In short, while they nominally eschew any rigid test for fair use in all instances, their position emerges quite clearly — fair use will allow any use of copyright works as long as the expression of the resulting work produced by AI isn’t substantially similar to the works on which it was “trained.” I quote: “creating a database of lawfully accessed works for use as training data for machine learning will almost always be considered non-infringing in circumstances where the output of that process does not compete with the works used to train the AI system.” I think this is fundamentally wrong — both as a matter of law and as a matter of justice (with the latter being infinitely more important).

If current narratives are to be believed, the future of writing, singing, composing etc. will increasingly be in the hands of machines. Now of course, machines don’t have hands, but nor do they have creativity. The works ingested by the machine are the raw data by which the machine becomes capable of reconfiguring words, symbols, notes etc. into new works. They are not “reading” as such — a point I highlight here because it has copyright implications as well as moral ones. BSA likens machine “learning” to how a human might ingest a book, combing through the protected expression while retaining the unprotected ideas. But while a human might very well operate in that manner, it’s a terrible stand-in for the operation of machines which by their very nature “learn” through reproduction, with such reproductions forming the basis of any new output. Those reproductions of expression, however temporary, are the raw materials used for the development of new forms of expression. In other words, AI isn’t just inspired by the works it ingests — it owes its very existence to them. As such, the notion of ingested works lacking economic or cultural significance as proposed by BSA couldn’t, in my view, be more incorrect. AI is the distillation of that which went before, and as such, depends on the past for all of the potential value it may create. BSA boldly and absolutely declares that “intermediate, non-expressive reproductions have no impact on the economic interests that copyright is intended to protect.” Given that copyright includes the right to create derivative works, and that AI is, by definition, the very essence of a derivative work, this statement is both puzzling and problematic. It may be that fair use will permit the unpermissioned, uncompensated use of certain. preexisting materials in certain instances (and we should always keep in mind that fair use is indeed the ability to use someone else’s original expression without securing permission or compensating the creator for it), but let’s undertake that analysis without invoking a false test about the duration of the copy giving rise to potential liability. The reproduction, however brief, is long enough to achieve its primary purpose of training the machine. Anything longer would be superfluous to the design and unnecessary.

BSA in fact spends quite a bit of time talking about what I see as the mostly irrelevant temporal limitations of the reproductions, and even speaks of them as being “incidental” and therefore noninfringing. Not only does this miss the point that the reproductions are as fixed for as long as needed for purpose, but also fails to adequately consider that reproductions being incidental (where that is the case) is not legally determinative in the US as it might be in the EU. Indeed, there were specific proposals made for legislative reform in the US to provide an exception for incidental reproductions, and these were rejected by Congress during consideration of the DMCA. See for example the 2001 testimony of Register of Copyright, Marybeth Peters: “Many commenters advocated a blanket exemption for temporary copies that are incidental to the operation of a device in the course of use of a work when that use is lawful under title 17. Such an exemption was originally proposed in the Boucher-Campbell bill as an amendment to section 117.”

Moreover, not only were such proposals rejected, but even the general discussion of incidental copies is distorted in BSA’s paper. While the transitory nature of an infringement (e.g. incidental reproduction) may be relevant in an analysis of fair use where it would be one of the factors considered by the Court, an exception for incidental reproductions in premised on the notion that where a use of a work is otherwise licensed, the fact that an incidental copy is made by technical necessity in furtherance of the licensed use shouldn’t give rise to an independent claim for licensing/liability. While the EU did adopt a broader formulation, the essential underlying theory justifying an exception for incidental copying relates principally to the notion that one could fairly assume an implied license where the copy is part of a chain of necessary technical requirements to achieve a licensed use. This was indeed the fact pattern considered in the US with respect to the temporary buffering of copies of music made by services licensed to communicate the works. But here, the issue of incidental reproductions arises not in conjunction with an otherwise licensed use, but to defend an unauthorized one. There is predicate for that, but it shouldn’t elude our attention as we consider these complex issues.

There are a variety of other problems and over-simplifications in the BSA paper, including the inexplicable failure to note the limitations imposed by both the Japanese and EU approaches to the use of preexisting materials — including in the EU the right of creators to opt out of the exception created under Article 4 (text and data mining by institutions other than research organisations and cultural heritage institutions), but I fear that this piece is already fairly long and complicated so I will end here. I will merely observe that the choices here are infinitely more complicated than one would assume based on BSA’s paper. Their excessively narrow framing allows the reader to imagine this as a purely technical matter, devoid of human consequence. But allowing the unauthorized use of images and other content to inform the development of AI is part and parcel of the erosion of human agency and represents a form of indentured servitude for those contributing to a future beyond their control. And perhaps even more fundamentally from a copyright standpoint, it forces creators to “educate” a system that theoretically will remove the basis of their livelihoods. Something that BSA innocently suggests has “no impact on the economic interests that copyright is intended to protect.” I think we have different ideas about the meaning of impact.

Written by Neil Turkewitz