Lumen — The Year in Review
By: Adam Holland, Andromeda Yelton, and Chris Bavitz
September 2019 through the end of August 2020 marked the first year in which Lumen operated with a generous supporting grant from the Arcadia Fund. During that year, the project’s primary objectives fell within three themes: (1) technical improvements to the Lumen site and database; (2) expanding research opportunities, both internal and external; and (3) outreach, both to possible new notice-submitters and to the various constituencies of the Lumen user community. This post draws from Lumen’s first annual report to Arcadia and provides an overview of the project’s key activities during the past year.
To say the least, it was a complex and difficult year on a number of fronts — most notably, because of the COVID-19 pandemic that forced us into a remote work mode for much of 2020. That said, we were able to make significant progress on a number of key fronts:
- Lumen’s developers and technical support team achieved a great deal, especially on the backend of the site, and also by adding new user interface features and new notice categories.
- Lumen had success increasing and improving external research using Lumen’s database, with database access credentials granted to 49 new researchers or research teams, ranging from college undergraduates and investigative reporters to law professors and economists. Several of these researchers produced substantive written work along with various other shorter articles and pieces.
- Regarding outreach, we had conversations with a range of organizations, online service providers and individuals regarding working more closely with Lumen. The Lumen team participated in a multi-stakeholder virtual workshop in June of 2020, with more topic-specific workshops planned.
- The onset of the pandemic put a wrench in the works in terms of hiring, but Lumen plans to bring onboard a new research fellow in the coming months.
The remainder of this overview addresses and provides more details on these main themes in the order outlined above.
(1) Technical Improvements and Progress
In addition to too many small-scale bug fixes and one-off requests to name, the Lumen developers’ activity in the first year fell into several key main categories:
- Lumen upgraded Rails (the web application framework the whole system uses) from version 4.2 to version 5.2.
- Lumen upgraded Ruby (the programming language in which Lumen’s site and administrative interface is built) from version 2.3 to version 2.5.
- Lumen upgraded various software libraries.
- Lumen is in the process of upgrading its native search function from ElasticSearch version 5 to ElasticSearch version 7, which is expected to significantly improve the ability of researchers and others to access notices in the database.
In combination, these upgrades and improvements improved system security and system performance, making the database notably faster for users. Additionally, the various improvements keep the site effectively modernized, which in turn allows developers to take advantage of and implement further improvements without too much work. Finally, the ongoing ElasticSearch upgrades allow Lumen administrators to more quickly and effectively redact sensitive data in Lumen’s notices (in addition to making site search functionality more powerful for users).
Overall, these technical improvements make the Lumen site easier to use by and more responsive to both its internal team and the research community. They also serve to “future-proof” the site to the extent possible, making it far more likely that Lumen will be able to continue to exist and thrive indefinitely, and making continued and sustained improvements easier to accomplish.
Improvements to the Lumen administrative interface
- Lumen updated rails_admin from version 1.4 to version 2. As an example of modernization making more modernization easier and possible, this upgrade was only possible because of the underlying Ruby upgrade mentioned above.
- Lumen added a Content Management System (CMS) to the site and migrated all the old blog posts and pages into it. The CMS will also make it much easier for the Lumen team to share out rich multimedia content on the Lumen website and write short pieces more rapidly and effectively. Some recent blog posts made using the CMS include a write-up of Lumen’s workshop on best practices in notice and takedown transparency, “Algorithmic Copyright Management: Background Audio, False Positives and De facto Censorship” and “Pandemic Misinformation Campaign Comes to Lumen.”
Improvements to receiving and sharing notice data
- A series of improvements to the Lumen application programming interface (“API”) improved the quality of data that the database is able to accept and process, most notably with respect to URLs submitted. Some URLs are malformed when submitted, and API improvements dramatically reduced the error rate upon submission.
- Prompted by a series of distributed denial of service attacks on the Lumen site, Andromeda Yelton, the lead developer for Lumen, spent a great deal of time and effort putting into place, documenting, and continuing to improve on a series of changes and improvements for managing the requests for data that Lumen receives through its API, which represent a potential vulnerability for the site and database. These changes improved Lumen’s data security, made it easier for legitimate researchers to continue to use the site, and helped better allocate site and system resources. Andromeda later gave a presentation about her work at Code4Lib.
- Lumen added “Counterfeit” and “Placeholder” notice types. Google began accepting takedown notices referencing the presence of counterfeit goods or advertisements for them on Google sites, and in order to accommodate this new notices stream, we created a new template for such notices. Having this new type available also made it possible for the Lumen team to effectively reach out to new possible submitters, including Amazon and eBay. The “placeholder” notice is another new type that allows Lumen to accept more notice streams. Some large OSPs, like Google, receive takedown requests about which they are unable to share the details for legal reasons. Despite this, they still wish to indicate that they have received a request. In those cases, they can point their users to a “placeholder” notice that provides what details are available.
Lumen made a series of changes regarding how visitors to the site see the URLs that are part of each notice. The changes make it possible for Lumen to present notice URLs in a truncated form to casual Lumen visitors, while still granting access to complete URLs to Lumen accredited researchers. Casual Lumen users can view one notice’s full set of URLs by providing an email address. Researchers with credentials can be granted access to notices within a limited time frame, up to a maximum specific number of notices, and with or without use of the Lumen API, and can also be given the ability to generate “permanent” versions of Lumen notice URLs that are suitable for use in published works or for citation.
(2) Research Using the Lumen Database
Lumen granted research credentials to forty-nine different researchers during the year in question. These researchers range from college undergraduates who have recently become interested in copyright law or censorship, to international researchers from a wide range of countries, including Brazil, Turkey, Ukraine, France, India, Austria, Russia, Germany, and the UK, as well as EU-affiliated researchers and international NGOs such as the Committee to Protect Journalists, as well as law professors and journalists and others in the United States.
Many of the projects that these researchers are working on are still ongoing, such as Professor Eugene Volokh’s ongoing series of law journal articles about falsified court orders and online defamation law. Some of the completed research projects include:
- Asher-Schapiro, Avi, Zidan, Ahmed. “India Uses Opaque Legal Process to Suppress Kashmiri Journalism, Commentary on Twitter,” Committee to Protect Journalists (blog), October 24, 2019, https://cpj.org/2019/10/india-opaque-legal-process-suppress-kashmir-twitter/.
- Fuller, Andrea, Grind, Kirsten, Palazzolo, Joe. “Google Hides News, Tricked by Fake Claims,” Wall Street Journal, May 15, 2020, sec. Tech, https://www.wsj.com/articles/google-dmca-copyright-claims-takedown-online-reputation-11589557001.
- Akdeniz Yaman, Guven, Ozan (2019). “EngelliWeb[HA4] 2019: An Iceberg of Unseen Internet Censorship in Turkey”. https://ifade.org.tr/reports/EngelliWeb_2019_Eng.pdf
- Hovyadinov, Sergei, Toward a More Meaningful Transparency: Examining Twitter, Google, and Facebook’s Transparency Reporting and Removal Practices in Russia (November 30, 2019). Available at SSRN: https://ssrn.com/abstract=3535671 or http://dx.doi.org/10.2139/ssrn.3535671
- Srivas, Anuj. ““At ‘Government Request’, Twitter Blocks Tweet by BJP MP Tejasvi Surya,” The Wire, accessed October 8, 2020, https://thewire.in/tech/at-govt-request-twitter-blocks-hate-speech-including-tweet-of-bjp-mp-tejasvi-surya.
- Matias, J. N., Mou, M. E., Penney, J., & Klein, M. (2020). Do Automated Legal Threats Reduce Freedom of Expression Online? Preliminary Results from a Natural Experiment. https://osf.io/nc7e2/
Over the summer of 2020, the Lumen team also worked closely with a Harvard Law School student research assistant to begin developing a taxonomy of takedown notices, their underlying data, and the various involved stakeholders. This draft taxonomy seeks to cast light on the range of interests and incentives that a given stakeholder in the notice and takedown (“N&TD”) ecosystem must balance with respect to whether a particular piece of information should come down and the degree to which there should be transparency regarding the request and any subsequent action taken. It is the Lumen team’s hope to soon turn this working draft into a white paper, as well as the raw material for a Lumen workshop, as well as use it to inform discussions on any statement of best practices regarding N&TD transparency.
The Lumen team’s original plan had been to hold a fairly intimate in-person workshop over the course of two days, as a way of initiating conversation between the various parts of Lumen’s user and research communities, and to plant the seed for more detailed and targeted workshops to come. Unfortunately, the COVID-19 pandemic got in the way of those plans, and as a result, the June workshop was held virtually. Although the Lumen team members were of course very disappointed to not be able to have the full in-depth workshop we had planned, especially the face-to-face network building and conversations, hosting a virtual event had some positive aspects. These included lower costs and the possibility of drawing more participants. The end result was that we were able to diversify and expand the initial invitee list substantially, including a wider range of interested parties, and — critically — giving the group more international representation. On that note, it meant that some foreign human rights activists who would otherwise not have been able to attend were present — including representatives of EngelliWeb, which has published a human rights report on Turkish takedowns that relies heavily on Lumen. The most recent of EngelliWeb’s reports can be found here.
Using the lessons learned from this first virtual event, and anticipating that virtual events will be the norm for the foreseeable future, Lumen has planned a series of smaller and more topically focused events for the coming fall and winter, the first few of which will be focused on learning more from current and prospective Lumen researchers.
Outreach to New Sources of Notices and Notice Data
Encouraging recipients and senders of takedown notices to share copies of those notices with Lumen has proven to be one of the biggest challenges the team has faced. Although Lumen’s name recognition has clearly improved, due in no small part to the increased publicity from outside journalism and research publications, and although those companies with whom Lumen has existing relationships are generally positive about the benefits of sharing, some institutions are still loathe to share notices and notice data. Finding ways to be more effective at turning preliminary outreach into new data-sharing arrangements will be a top priority for the Lumen team in the coming year.
General Outreach and Media Participation
In addition to the June 2020 workshop mentioned above, and their ongoing work with Lumen researchers, members of the Lumen team participated in the following activities:
- Prompted by the increased attention paid to the Internet Archive after it announced its pandemic-motivated National Emergency Library, Adam Holland wrote a Medium piece examining the various points of view on the library, and urging the NEL to share any takedown notices it received with Lumen.
- Lumen provided some statistics on Google takedowns to TorrentFreak for an article.
- Lumen provided statistics on takedowns to Professor Rebecca Tushnet for her testimony to the Senate Judiciary subcommittee on the DMCA.
- Lumen Project Manager Adam Holland answered some questions from a cyberlaw professor about Google’s search index and robots.txt pertinent to the way in which the Florida Department of Law Enforcement (FDLE) operates its publicly-accessible database of FL residents who have been previously convicted of various sex offenses.
- As noted above, Andromeda Yelton gave a presentation on her work defending Lumen against a Russian botnet to Code4Lib.
- Adam spoke with a member of the data team of The Correspondent, a newsroom based in Amsterdam.
- Adam provided general commentary regarding the DMCA and United States fair use law to Daniel Laufer, a German reporter interested in how a German company, Acromax Media, may be abusing the DMCA to take down critical reporting in Albania, and was quoted in the ensuing article.
- · Adam spoke to a New York Times reporter to give background information about DMCA takedowns for an article about a lawsuit between two self-published Amazon authors.
- Adam spoke with CBS News regarding a story about Professor Volokh’s research.
- Adam spoke with the San Antonio Express News for a story posted online here.
- Additionally, members of the Lumen team provided background information regarding Lumen, the DMCA and notice & takedown generally to inquiries from journalists, activists, legislative staffers, researchers, and other interested parties.
Of special note, on December 16, 2019, Lumen project manager Adam Holland and Lumen PI Chris Bavitz made comments to the Third Meeting of the Stakeholder Dialogue on Art. 17 of the Directive on Copyright in the Digital Single Market in Brussels. Article 17 references “”Use of protected content by online content-sharing service providers.”
The presentation was well-received, and also was a boost to Lumen’s broader publicity. Lumen was invited to join a multi-stakeholder mailing list regarding ongoing Article 17 discussions, in which it continues to participate, and also made several new EU contacts, including a former member to the EU Parliament, who have kept Lumen apprised of opportunities to contribute comments or thoughts to ongoing copyright and intermediary liability-related legislative and regulatory discussions within the EU.
A copy of the remarks can be found at:
Bavitz, Chris, Holland,Adam, “Lumen Presents Comments to the Third Meeting of the Stakeholder Dialogue on Art. 17 of the Directive on Copyright in the Digital Single Market in Brussels” (December 17, 2019) https://www.lumendatabase.org/blog_entries/807
A recording of the day’s proceedings is available at:
“COPYRIGHT STAKEHOLDER DIALOGUES — Streaming Service of the European Commission,” https://webcast.ec.europa.eu/copyright-stakeholder-dialogues-16-12, (accessed October 8, 2020)
Lumen’s participation begins at approximately the 4:00:00 mark.
Other outreach efforts
The Lumen team has also had productive conversations with a variety of other activists and researchers about possible cooperative efforts, including with Carrie Goldberg, an American lawyer specializing in representing victims of so-called “revenge porn”; the “Disinfodex” project emerging from the Berkman Klein Center’s 2019–2020 Assembly Program; the Digital Public Library of America, the Reporters Committee for Freedom of the Press, Harvard’s Caselaw Access Project, and the Humboldt Institute for Internet and Society in Berlin.
Social Media Statistics
Lumen maintains a Twitter account, from which it tweets or retweets about content moderation, takedowns, censorship, academic freedoms, the “right to be forgotten” and other news related to online information. During the period from September 1, 2019 to August 31, 2020:
- The account added 986 new followers, a ~25% increase
- The engagement rate on Lumen’s tweets and retweets went up each quarter, from 0.9% in the first to 2.0 % in the fourth.
- Lumen’s tweets received 1.41 million total impressions, and an average of between 20 and 25 link clicks per day.
- Lumen’s top mention in terms of engagements (an order of magnitude greater than typical) was when a CNN reporter mentioned Lumen in a tweet referencing a takedown notice sent regarding Donald Trump’s account.
Data and Material Produced
During this year, the Lumen database added ~2.6 million more notices, referencing many millions of URLs, involving approximately fifty-eight thousand separate entities. As mentioned above in the technical improvements sections, we put into place our planned changes for displaying URLs in a truncated form to casual Lumen visitors, while granting access to full notices with complete URLs to researchers requesting access. We were and are gratified to have received relatively few complaints from users regarding the change, and none from active researchers. Current policy is to grant a single request per email address to view a notice. Lumen has consistently averaged approximately one thousand such requests per day, but may revisit and revise the bounds of that policy in the coming year.
During the time period from September 1, 2019 to September 1, 2020, Lumen received almost six hundred thousand unique visitors, who visited Lumen close to fourteen million times, viewing over nineteen million unique Lumen website pages. These traffic numbers represent an approximately 50% increase in activity from the previous year, which the Lumen team attributes to both more research activity and greater use of the site by the public at large.
The most visited Lumen URL was http://lumendatabase.org/notices/9415, which is a Google placeholder notice for search results that contain URLs reported as illegal under German youth protection laws. There is no way to be certain as to why this notice is visited often, but it may be that this notice’s popularity is a rough proxy for the number of such removals by Google in Germany and the number of searches the internet-using German public performs for the underlying material. Or, it could be the relative novelty of the new laws is driving interest. The second most visited Lumen page, close behind first in terms of total visits, was Lumen’s own search page.
In the year to come, the Lumen team looks forward to continued progress on all fronts, from expanding the scope, scale and impact of research done with Lumen’s data and gathering new sources of takedown notice data, to improving the Lumen user experience and adding new members to the Lumen team, There will be more events, whether virtual or in person, more publications, and more opportunities to get involved.