In 2010 I had a chance to see Sarah Charlersworth’s work “Herald Tribune: November 1977” in an exhibition at Guggenheim Museum in New York. The prints comprising Herald Tribune: November 1977, collect the front pages of the newspaper surveyed over the course of a month with masked out the columns of text and headlines on photocopied images of the newspaper, leaving visible only the masthead and images.
Sarah Edwards Charlesworth (March 29, 1947 — June 25, 2013) was an American conceptual artist and photographer. She is considered part of The Pictures Generation, a loose-knit group of artists working in New York in the late 1970s and early 1980s, all of whom were concerned with how images shape our everyday lives and society as a whole. — Wikipedia
After a few minutes contemplating the prints and listening to the information about the process she used, I thought with myself that one could apply the same concept to the digital versions of the printed newspapers, worldwide, using a little knowledge of computer programming and some hacks.
Four years have past, but only a few days ago I had enough time to start coding around this. I started searching for newspapers with digital versions of the front page available online. The first one I tried, was The Herald Tribune due the fact it was the newspaper I had seen in the Sarah Charlesworth work at Guggenheim. Unfortunately, the printed version of the The Herald Tribune do not exist anymore since 2013.
There are many newspapers with their front page available in a digital format like JPG or PNG. But, I chose to start with The New York Times, due the availability of front pages in PDF format. Then, I needed to obtain a good sample of different PDF front pages to start working on. Luckily, The New York Times’s PDF version of the front page is available to download in the following URL pattern:
Where, one can easily download a front page in PDF format for any edition since July 6, 2012. When apparently, The New York Times started to release the front page in PDF format.
So, I wrote a simple bash program to download all available PDF versions of the printed cover using curl. After downloading approximately 900 files, I started looking for a code library to work on them. I found a good one called PDFBox, from The Apache Software Foundation.
The Apache PDFBox library is an open source Java tool for working with PDF documents. PDFBox is a very stable and robust library, with lots of examples and great community. So, it wasn’t that difficult to learn how to parse, extract and replace text in PDF files. However, to keep the original concept of the Sarah Charlesworth, I should remove all text and headlines on the front page of the newspaper, leaving visible only the masthead and images. That seemed to be a simple task, but PDF format is tricky!
After many, try and error, versions of my code. I finally came up with an algorithm based on newspaper tag price that achieved a 90% success rate in extracting all text and leaving visible only the masthead and images. You can see a sample in the animation below comprising the entire month of September, 2014.
It’s impressive how Charlesworth’s work is still so contemporary, even in a new world of cryptocurrencies, API’s, massive image sharing, ubiquitous advertising and privacy concerns. After almost 40 years, Charlesworth’s concerns with how images shape our everyday lives and society as a whole are even more relevant. Ironically, the business model of newspaper industry is now definitely broken.
You can see the full exhibition at http://modern-history-2.art.br