Offshoring could do good – for journalism

We talked to data journalist Nicolas Kayser-Bril about the Offshore Journalism Toolkit.

Published in

Global Editors Network

10 min readNov 30, 2017

Turns out online content has an expiration date. Following the shutdown of US news websites Gothamist and DNAinfo at the beginning of the month and the fear that ensued regarding the potential loss of years of content, the penny has dropped about the importance of archiving.

Over the last year, longtime journalist Mario Tedeschini-Lalli and data journalist Nicolas Kayser-Bril have been researching the amount of legitimate content that goes missing from the internet and trying to find a way to make it stop. Enter the Offshore Journalism Toolkit: If corporations can go offshore to minimise their tax burden, why can’t newsrooms and publishers do the same to maximise free speech?

While the idea may seem a bit ‘out there’, Tedeschini-Lalli and Kayser-Bril are in the process of turning it into reality.

Why is so much information going missing?

News has an expiration date

According to a June 2017 report by Kayser-Bril and Tedeschini-Lalli, the idea for offshore journalism initially came about as a result of a couple of petty crime cases in the Abruzzo region of Southern Italy, which were covered by a relatively unknown local news website, Primadoni.it. The stories were legitimate and factually correct, but had to be deleted under court orders on the basis of the ‘right to be forgotten’.

The first story was published in March 2006 and involved a couple who was arrested for ‘attempted extortion’. Once they had been acquitted the following year, Primadoni.it dug up the initial story and updated it, stating that the couple had been cleared of any wrongdoing. The couple repeatedly asked the newsroom to remove the story, as it continuously came up in web searches related to their names, presumably resulting in negative impact on their professional and personal lives.

The Italian Data Protection Agency was asked to intervene, but they ruled that the information could be kept online: the couple therefore decided to sue the news organisation. In 2011, the court of Ordona ordered the news organisation to delete the article and pay the couple €5,000 in damages: the couple’s right to ‘privacy, reputation, and honour’ in this case cancelling out the right to inform and be informed.

The line of reasoning was that the article had lost its value seeing as time had passed. But who can decide at what point a news story becomes irrelevant to the general public? What if this story involved a political figure? Would such a story, too, have a short shelf life?

The second story published by Primadoni.it in 2008 involved a public brawl at a local restaurant between four family members, which resulted in two stabbings and four arrests. Two years later, one of the brawlers sued the website on the grounds that his reputation and that of his restaurant were still being compromised even though the passing of time had dissolved the public’s interest in the story. In 2014, the court ruled against Primadoni.it on the grounds that the continued publication of the story violated the claimant’s right to privacy and reputation.

But Primadoni.it would not take it. The news website appealed to the Supreme Court of Cassation, expecting the judges to rule in favour of freedom of expression. In June 2016, however, the Supreme Court ruled against Primadoni.it, effectively declaring that news do indeed have an expiration date.

Allessandro Biancardi, editor and publisher of the website, wrote in an editorial: ‘In effect the ruling affirms that the two people who stabbed each other in their restaurant suffered damages to their reputation (personal and of the restaurant) not because of the violence of their actions, but because of the report about them that remained accessible on the web.’

The right to be forgotten or the right to save face?

According to the report, the Italian website has received 80–100 requests for material to be deleted in the last seven years. Following the European Court of Justice ruling on the Google-Spain case, PrimadaNoi.it routinely asks Google to de-link material when faced with takedown requests. Biancardi said that a majority of newsrooms take articles down to avoid getting into trouble: most of these articles will be gone forever.

Access denied

Next door in France, due to anti-terrorism measures adopted by the French parliament in November 2014, French police were able to block five websites on suspicion of promoting terrorism and hate speech, without a court order.

The editor and publisher of one of the websites, Islamic-news.info, published an open letter following the shutdown of his site, claiming that no terrorist organisation was in any way involved in his work.

One French journalist from Le Monde revealed that the website was removed because it had put online a speech by al-Baghdadi, leader of the Islamic State. Simply putting up a speech is in Kayser-Bril’s opinion ‘hard to consider extremist’. He also points out that the archives from the blocked site suggest a very opinionated editorial line, but don’t point towards anything illegal or specifically violent and hateful.

Brutal censorship

Turkey is ranked 155th among 180 countries on the Reporters Without Borders 2017 Press Freedom Index and according to the Twitter Transparency Report released in March 2017, Turkey issues the largest number of censorship requests by court order.

Turkey’s Cumhuriyet came under serious fire after publishing footage showing government trucks transporting weapons to neighbouring Syria. The day the article was published, a court in Adana issued an immediate gag order. All media were banned from reporting on the story, and all content had to be removed under the ‘Anti-Terror Law’. According to Turkey’s ‘Internet law’, online content has to be removed within the four hours that the court order is received by website owners.

A large number of newspapers in Turkey are also under threat of losing their archives. The Offshore Journalism report states that ‘in July 2016 only, 45 newspapers, three news agencies, 16 TV stations, 23 radio channels, and 29 publishing houses were taken down on charges of producing propaganda for terror organisations.’ According to the report, members of staff of some of these news organisations claim that government-appointed trustees deleted the newspaper’s archives from their servers. While there is no confirmation of the government deleting documents, it is striking that the archives of one of Turkey’s former top-selling newspapers, Zaman, and its English-language sister publication, Today’s Zaman and Birgün, have all gone missing.

‘This method is particularly harsh and terrifying since the purpose of deleting an archive is not only the removal of an unwanted content, but also an attempt to wipe out the existence of the newspaper from history, as if it never existed’, states the report.

Money problems

Another website, according to Kayser-Bril, that risks being deleted from history is Diário Digital, Portugal’s largest and oldest online newsroom, which was shut down at the beginning of 2017 for financial reasons. While the site did not have a pay wall or specific archiving instructions that could have stopped pages from being saved, most of the work from its 17 years of service cannot be found on any digital archiving services.

Without archives, how can we hold power to account?

‘Our ability to hold powerful people or institutions to account lies in our ability to examine their past deeds. These deeds, in turn, are chronicled in archived content, often in news articles. Not having access to them or, worse, not knowing which past articles have been deleted, makes one of the basic missions of journalism impossible, which would be (or already is — we don’t know what has been removed so far) dangerous for our societies’, says Kayser-Bril.

While online archiving solutions already exist, such as Arquivo.pt (state owned), Archive.is (privately owned), and the Internet Archive (which runs on donations), none of them truly focus on preserving news articles, meaning that they don’t store the most important or valuable pieces of journalism. The Internet Archive, for example, was set up by a Silicon Valley entrepreneur in the late 90’s to store cultural artefacts in an internet library for researchers, historians, and scholars.

‘The trove they’ve built is extraordinary, but it’s far from comprehensive. Today’s web is more dynamic than ever and therefore more at-risk than it sometimes seems’, writes Adrienne LaFrance in the Atlantic.

On the Columbia Journalism Review, Abigail Grotke, who leads the web archiving team at the Library of Congress, tells Lene Bech Sillesen that the amount of digital news content they are able to archive is limited. ‘We haven’t done a whole lot over the years but we’re trying to do more news […] especially online-only publications and more local, regional papers.’

Why are newsrooms not archiving their digital content? Is it because they are not paid to do so? But at the same time, why do almost all newsrooms have in-house archiving services for their print version, which are far more expensive to maintain than digital storage?

‘There has to be a change of culture’, says Edward McCain, digital curator of content at The Missouri School of Journalism’s Donald W. Reynolds Institute in the CJR article. ‘At some point, [journalists and editors] gonna have to take an interest’.

What exactly is the offshore toolkit, then?

‘We could have set up a huge financial fund to help publishers fight massive legal cases’, says Kayser-Bril. ‘But that’s not a €50,000 project. That’s not enough money to address the issue of legal threats being made against publishers’.

Instead, the Offshore Journalism Toolkit, funded by Google Digital News Initiative (DNI), ensures that publishers can easily save content in other jurisdictions before any legal action can be taken. That way, if publishers do accept to remove content from their servers, it ensures that it is not deleted for everyone forever.

How does it work?

Kayser-Bril and Tedeschini-Lalli proposed to create a standard:

A meta-tag, which is a very simple piece of HTML code that publishers can add to their articles.
A crawler that scans the tagged articles, takes a copy, and sends it to one or several offshore archiving services.

What’s so great about it?

Save only important and original content

Publishers will be able to tag articles manually or automatically. For example, articles that contain the name of certain politicians or are part of a certain news section can have an automatic meta-tag. This avoids publishers from archiving useless content that would only clog up the archiving server — such as the weather section. More important sections that aren’t carried by any other outlets, such as the local news section, could be easily identified and stowed away.
It also ensures that original copies of articles are archived.

Save video and audio content

The crawlers could also extract audio and video content from a page (including YouTube videos) and preserve it accordingly. No preservation initiatives deal with video or multimedia content just yet.

Keep exclusivity on content

The meta-tag can be set up in different ways allowing for a number of possibilities when it comes to archiving the material. One tag can signal that a certain piece of content should only be archived after a certain amount of time in case a publisher wants exclusivity on it just after publication. And, perhaps more interestingly, a publisher can add a code to the tag that dictates a piece of content only be archived once it has been deleted. If the crawler picks up on a 404 HTTP error, or an error message of any kind, the article will be pushed to an archiving service.

No legal risk for publishers

While publishers could archive articles that are at risk of removal themselves, it would be time consuming and open to legal risk: If a court requests the deletion of an article, a publisher could be held in contempt if a copy was made after the legal proceedings started, writes Kayser-Bril in an article on Medium. With this tool, ‘the publisher does basically nothing except for adding a meta tag to some articles. And the preservation is done by the crawler, which can be set up by anyone in any jurisdiction’, says Kayser-Bril. ‘It’s extremely easy to set up’. And there is no legal risk for the publisher. ‘They have to pay zero. It’s two minutes of work for one engineer and then they’re done.’

For the time being, Tedeschini-Lalli and Kayser-Bril are only working with the Internet Archive whose server is based in the US and in Canada, and archive.is, who have servers in Russia and Germany. Kayser-Bril says that these are probably the largest online archiving services and once an article is on one of them, it is ‘probably pretty much undestroyable’. They will keep adding archiving services from other countries in order to maximise the offshore potential.

‘We have given a lot of thought to where to archive content is and the conclusion is that the jurisdiction doesn’t matter. What matters is having as many archiving services as possible to make sure that the content will be preserved whatever the political changes in the next couple of decades may be’, says Kayser-Bril.

The toolkit is currently in testing phase with a couple of newsrooms in Italy to figure out how much it costs to set up the crawler. Kayser-Bril says they are expecting costs of under a euro per month to monitor and preserve one mid-size news outlet. So far, feedback has been positive.

‘They’re happy that their content is archived and they’re even happier they don’t have to do anything [to get it there]’, says Kayser-Bril. The crawler will be ready in a couple of weeks and ideally, the standard will be adopted by as many publishers as possible in order to preserve today’s journalism for tomorrow.