Who Polices the Internet?

Trevor Devine CC BY-NC

The Digital Millennium Copyright Act 1998 (DMCA) may be one of the most influential laws to affect the internet. Under the DMCA, copyright owners can send takedown notices to internet service providers, content hosts and search engines to request the removal of allegedly infringing content.

Because removing content from the internet is practically impossible, rightsholders instead often focus on making it more difficult to find. This makes massively popular search engines like Google a target for DMCA requests. In February 2011, Google decided to start collating data in relation to the takedown notices it received. This year, it is set to process over a billion requests to remove content from its search index.

Google released data about DMCA takedown requests via its Transparency Report. In this article, we aim to provide a preliminary analysis of some trends observed in Google’s data from February 2011 to May 2016. While these findings should not yet be relied upon, they highlight some interesting questions that we intend to investigate further.

Who is Sending the Most Takedown Notices?

Over 3.4 million separate takedown notices have been sent to Google since February 2011.

The overwhelming majority of notices sent to Google are sent by a small group of copyright owners. The top 10% of senders are responsible for well over 90% of notices, and this group is also responsible for the large increase in notices being sent over the last five years:

Overall Trends observed in Groups of Notice Senders according to Notices Sent

Importantly, each notice can contain thousands of requests for individual URLs to be removed. When we look at discrete URLs, we see that the top 0.1% of copyright owners are responsible for most takedown requests. This proportionally tiny group currently requests that Google remove over 76 million URLs per month, compared to approximately 50,000 URLs per month for the bottom 99.9%. These massive numbers are attributable to the use of automated notice sending systems, which are used frequently by higher volume notice senders:

Overall Trends observed in Groups of Senders according to number of potentially infringing URLs

Interestingly, the Top 0.1% of Notice Senders slowly built up the number of URLs specified in each notice sent until October 2012. At that point in time, the number of URLs in each notice jumped from less than 500 to over 1500 URLs per notice.

In recent months, the number of URLs per notice has grown to over 3000 for the Top 0.1% of Notice Senders. In contrast, the Bottom 90% of Notice Senders have only averaged about 10 URLs per notice.

Overall Trends observed in Groups of Senders according to URLs specified in each notice

What Proportion of Takedown Notices are Rejected by Google?

Google is tasked with assessing the validity of the notices sent to it as part of the DMCA process. While the specifics of Google’s takedown notice assessment procedure are unknown, Google has implemented a partnership program to enable copyright owners to submit high volumes of infringement notices.

The program, known as Google’s Trusted Copyright Removal Program (TCRP), automates the submission of takedown notices for its members. Google states that TCRP exists for its trusted users who have a ‘consistent need to submit thousands of URLs each day’ if they have a ‘proven track record of submitting accurate notices’.

What Google defines as ‘accurate’ can be determined by examination of the number of notices it rejects, according to the Transparency Data.

Accuracy of Top 10% of Takedown Notice Senders

Google’s data reveals a global URL rejection rate of just 9.35% for the Top 10% of Notice Senders:

Top 10% of Notice Senders URL Rejection Rate

It is interesting to note that the rejection rate for the Top 10% of Notice Senders has not changed drastically over time — there has only been a slight increase in rejections over the last 6 months, with the rate reaching 11.65% on average over that period.

This is despite the massive increase in both notices being sent and the volume of URLs identified in those notices observed since October 2012.

Accuracy of Bottom 90% of Takedown Notice Senders

In March 2016, Jennifer Urban, Joe Karaganis and Brianna Schofield conducted an examination of DMCA notices sent to Google using data collated through the Lumen Database. Their findings were part of a larger study into the practical effectiveness of the DMCA notice-and-takedown system published by the University of California, Notice and Takedown in Everyday Practice.

Urban, Karaganis and Schofield theorised that the most consistent predictor of a ‘low-quality’ or inaccurate notice was whether it was sent by ‘a first-time, one-off or low volume sender’. Their findings state that low volume senders were ‘most likely to misunderstand the notice and takedown process, mistake the statutory requirements, or use it for clearly improper purposes’. In five years of Transparency data, we see that Google’s own internal verification processes support this conclusion.

The global average rejection rate was 24.3% for the Bottom 90% of Notice Senders. This is more than double the rejection rate for the Top 10% of Notice Senders.

Bottom 90% of Notice Senders URL Rejection Rate

A lack of knowledge of the DMCA process may be contributing to the inaccurate sending of notices. Low volume senders who are unfamiliar with the statutory requirements of the DMCA may be incorrectly following Google’s web form.

Urban, Karaganis and Schofield also attributed the inaccuracy of low volume senders to use of the DMCA procedures for ‘clearly improper purposes’. Copyright notice-and-takedown provides the easiest mechanism to get material (and links to material) removed from the internet — so it’s not surprising that people might seek to misuse the process for non-copyright purposes. These requests are often businesses seeking to remove negative reviews, or individuals looking to get links to defamatory or negative content removed from the web.

Google also has an internal system to label senders as an ‘abuser’ if Google believes them to be abusing the DMCA process.

Of the 450 senders listed as an ‘abuser’ according to Google, 43.77% of senders had only sent one notice and 85% had sent less than 10. This supports the conclusions reached by Urban, Karaganis and Schofield, but raises further questions as to how Google classifies an ‘abuser’ of the process and the steps taken in relation to any further notices sent.

What happens when sites are shut down?

Urban, Karaganis and Schofield noted a number of copyright owners who continued to send takedown notices to file-sharing sites for lengthy periods after the sites were formally shut down. They concluded that some notice senders had failed to update their automated sending algorithms or implement basic cross-checking procedures involving human judgment.

We looked at the Transparency Data for a select group of shuttered sites:

There are a few things to note here. First, some proportion of these notices are clearly invalid — requesting takedowns for URLs that are no longer active. The trends observed also suggest that copyright owners may be sending notices without checking whether their content continues to be infringed. This suggests that some of the automated processes for sending copyright infringement notices are flawed and could benefit from greater oversight and review.

Second, these figures show just how difficult it is to shutdown sites. Look particularly at the notices sent to Demonoid after it was first shut down in August 2012. We see here that only a few months after the very expensive campaign to have Demonoid shutdown that rightsholders are again having to deal with the same site, now hosted on different domains and presumably in different locations. Demonoid emerged under the domain demonoid.ph in March 2014, demonoid.pw in December 2014, and demonoid.ooo in February 2015. Notices sent to these three URLs account for 91.71% of the notices sent to Demonoid after it first closed in August 2012. All three have since been taken down.

The example of Demonid’s takedown and subsequent resurgence is reflective of a wider problem relating to internet copyright infringement. Sites that are taken down do not stay down. Ultimately, we think that this might suggest that the long-term benefits of website blocking and high-cost international enforcement initiatives may be somewhat limited.

Next Steps

This short piece has highlighted a number of questions we’re looking at investigating further:

  1. How accurate are notices? We have seen many reported instances where the DMCA process has been misused. In this brief look at Google’s data, we have relied on Google’s determination of whether a notice is valid. Next, we need to test the accuracy of notices ourselves, and attempt to reproduce the findings of Urban, Karaganis and Schofield in Notice and Takedown in Everyday Practice. This would require in-depth examination of a sample of notices, perhaps making use of the Lumen Database, formerly known as the Chilling Effects Clearinghouse.
  2. How good are Google’s evaluation systems? Given the massive numbers of notices that Google must evaluate, the accuracy of its systems are quite important. We collectively rely on Google to accurately reject notices that do not clearly refer to infringing content — including notices that are sent for fraudulent or invalid reasons. In future research, we would like to test Google’s verification processes, including the mechanisms through which senders are labeled as ‘abusers’ of the DMCA process.
  3. Was there any effect on notice sending behaviour as a result of the Lenz decision? Lenz v Universal Music, also known as the ‘dancing baby case’, was a 2015 decision of the US Court of Appeals where it was found that entities who file DMCA takedowns must take into account the ‘fair use’ exception to copyright infringement. Some commentators have even argued that an automated notice sending process lacks the human input required to assess ‘fair use’. If the Lenz decision has had any effect on the process used by copyright owners in sending notices, we should be able to see it in Google’s transparency data.



