Sustainable Text and Data Mining: A Look at the Recent EU Copyright Directive
by Neil Turkewitz
While issues of the publisher’s right and the enhanced duty of platforms to ensure use of licensed materials (for most of the debate, Articles 11 & 13) consumed much of the oxygen and nearly all of the spotlight in the process leading to the adoption of the Copyright Directive, there were a variety of other important elements contained in the Directive, including the adoption of a specific exception to copyright for data and text mining. Data and text mining, defined in the Directive as “any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations,” has become more and more important in a variety of technologies — in particular in connection with AI, and demands the attention of policy-makers outside of the relatively small circle of people that have traditionally focused on this matter.
Articles 3 & 4 of the Directive set out the parameters of this exception: Article 3 provides an exception for the reproduction of copyright-protected works and the extraction from the databases of research organisations and cultural heritage institutions, provided it is done for the sole purpose of scientific research; Article 4 expands the exception to any entity, but specifically permits copyright owners to opt-out of the application of the exception, thereby creating a rebuttable presumption of permitted use. Importantly, Articles 3 & 4 are premised on the correct understanding that text & data mining would generally involve acts that implicated copyright in the absence of an exception. As described here: “Text and data mining enables the harnessing of large amounts of information available in digital form and the extraction of its value. This practice is especially important in the context of artificial intelligence. The DSM Directive acknowledges this importance and accordingly introduces two new text and data mining (TDM) exceptions. The introduction of these provisions also clarifies that TDM of copyright-protected works will be, in the absence of any exception, a breach of copyright.”
Given this understanding of the implications of TDM on copyright, and therefore on rights protected under international treaties to which the EU and/or its members states are parties, including in particular the Berne Convention, TRIPS and the so-called WIPO Internet Treaties, there may be some questions about the Directive’s relationship to international law. It is not my intention to question the EU’s judgment on these questions, but I raise in passing that implementation and the evolution of the effects of these provisions bear careful scrutiny, particularly as machine reading of copyright works and the subsequent expression based thereon becomes more and more central to commerce. The relationship of these provisions with the requirements of the three step test under Article 9(2) of the Berne Convention (essentially mirrored in TRIPS & the WCT/WPPT) and with the ban on formalities is complex and evolving, and will need to be constantly assessed and reassessed.
To be consistent with international law, exceptions must first of all apply only to “special cases.” Again, the emerging centrality of text and data mining, particularly outside of non-profit research institutions, creates a potential tension with this very first requirement. In addition, at this juncture, the E.U. was, as reflected in Recital 17, principally concerned with whether the exception prejudiced the legitimate interests of the author, and not whether it permitted uses that conflicted with a normal exploitation of the work. But as TDM becomes more central, it changes the nature of expectations about the meaning of a “normal exploitation.” This is particularly the case for the broader exception provided in Article 4, raising questions about whether the “opt out” is an adequate safeguard that would rescue an exception otherwise incompatible with the Berne Convention, even putting aside the question of formalities.
As the EU noted in Recital 17 with regard to Article 3 “In view of the nature and scope of the exception, which is limited to entities carrying out scientific research, any potential harm created to rightholders through this exception would be minimal.” To be consistent with international legal obligations, this will need to be ensured both in law and practice. Recital 18 with regard to Article 4 is also a critical part of this puzzle: “Rightholders should remain able to license the uses of their works or other subject matter falling outside the scope of the mandatory exception provided for in this Directive for text and data mining for the purposes of scientific research (i.e. TDM permitted by Article 3) and of the existing exceptions and limitations provided for in Directive.”
While Article 4 is designated an exception, as a practical matter it may be more useful to think about it as a form of extended collective licensing (ECL) permitting opt-out, rather than as an exception, reflecting an understanding that an exception of this scope would be inconsistent with EU/member state obligations under international law. It also reflects an interesting and consistent emphasis of the Directive as a whole to encourage licensing, including where appropriate via collective management. I examined that in connection with Article 13/17 in this piece. In this sense while I remain concerned about certain aspects of how the Directive addresses TDM, I am encouraged by the framing which clarifies that theories of innovation can not override the need to ensure the effective protection of copyright or be permitted to undermine licensing opportunities for creators.
Action by the EU on TDM raises an obvious question about the treatment of TDM under US law — but you TDM fans will need to be a little patient. I will address that in my follow-up piece to be posted next week.