From Hack to Hacker How Journalists Mould Computers Into Powerful Assistants

Barnaby Skinner
13 min readOct 28, 2018

--

Coding journalists may be the refiners of the 21st Century — not of oil, but data.

On 16th October 2016, a young man brutally assaulted and raped a student on the banks of the river Dreisam in the German city of Freiburg, throttled her and threw her unconscious into the river, where she drowned. The man admitted everything.

But he claimed to have acted in the heat of the moment and had not lain in wait for the young women, but had encountered her by chance at 2 in the morning. The whole thing, he said, had taken place in the space of a few moments. He then instantly fled the scene of the crime in panic. That was the offender’s story.

But when the police analysed the defendant’s encrypted mobile phone, they discovered digital traces that contradicted this version of events. Using a pacemaker app, they were able to prove that the accused had spent around 30 minutes in the bushes before the victim arrived on the scene. And that the heinous act had lasted a whole hour. It took that long before the accused left the scene of the crime. With the aid of the data, they could check right down to the precise second when the man dragged the woman into the water and even the exact position when he murdered her.

In the city of Zurich, Costas Bekas also works with data — on a rather more cheerful subject. He is the manager of Foundations of Cognitive Computing at IBM Research Zurich. He heads a team of around 150 people who are working on machines intended to assist doctors in their work.

What do your machines do, I asked him. And it seems they enable him and his team to detect skin cancer early on by comparing millions upon millions of images of skin diseases.

This begged my next question: So you are working to replace doctors with machines?

This didn’t go down too well with Costas Bekas, a Greek not blessed with the longest of fuses. That’s absurd, he said. It’s more an obligation on future doctors to make use of such aids. For as the population ages, there will be proportionally ever fewer doctors. So they will have to work more efficiently and more quickly. The most important thing about the whole development of artificial intelligence, he added, is not the algorithm itself, but explaining to the doctor exactly what’s going on. Explaining to the expert why a system has come up with a certain result. And making sure the doctor has understood what the computer is telling him.

Doctors, said Bekas, generally don’t like working with black boxes. The want to understand how a system works. He illustrated this with an example. He told me how he recently demonstrated the system to a doctor in a Zurich hospital. The doctor looked at the result of the skin cancer diagnosis, sat down at the computer and began clicking around in the system’s database on the many images he found there. What are you doing? Bekas asked him. The doctor said he was looking for the samples that the computer had apparently found.

He would have had to trawl through up to 15 million images to do that. An impossible task for a doctor with patients to attend to. But this situation illustrates the huge challenge facing Bekas and his team of data scientists. They not only have to build a system that works, but also one that works in a way that the experts can understand. This is why many members of this team are designers.

On 30 May 2017 last year The Intercept, a US publication founded by the journalist Glen Greenwald following the Snowden affair, sent a document to the NSA, the US National Security Agency. It was an NSA memo, which showed that Russian military intelligence had hacked into electronic ballot boxes just before the US elections. Most likely without success, but the NSA was clearly able to identify the attack itself.

On 3 June last year, 25-year-old, delightfully named Reality Winner, a translator working for the US Air Force, was arrested. She had stumbled across the report during her work, printed out the PDF and then sent it anonymously to The Intercept.

When The Intercept then sent the document to the NSA for verification, the publication inadvertently revealed the identity of the source. The US internet magazine had not considered the possibility that, on closer inspection of the original document, the NSA could establish what printer had been used to print it out.

Printers leave small yellow dots on the paper, like fingerprints. NSA investigations found that Winner was the only person who could have been the whistleblower. There is some controversy over whether this was the sole reason for the NSA to be able to identify the leaker.

Reality Winner has since been remanded in custody, where she is kept in solitary confinement most of the time. She is treated as a traitor to her country; twice her applications for bail have been turned down. It was not until 6 months later that she could discuss the details of her case with her defence lawyer. On August 23 Winner was sentenced to 5 years and three months in prison, simply because the journalists from Intercept disclosed their source to the NSA against their better knowledge.

Why have I chosen to recount these three anecdotes for you?

Well, apart from the fact that they are very interesting stories in themselves, they also show that there is no longer any difference between the digital and the real world; but more importantly, these three anecdotes illustrate how journalists need to use this coalescence of the digital and the real world for their research. The example of The Intercept shows they even have an obligation to do this. Only in this way can journalists protect themselves and their sources.

Above all, journalists must learn how they can use the huge volume of data collected for journalist purposes in order to identify the most fascinating and relevant stories. How the police in Freiburg did this, for example, by hacking the decryption of the mobile phone and picking out the log data of a health app.

Are journalists allowed to hack into a system?

Digital expert and lawyer Martin Steiger cannot imagine any case in which a hack would be justified. But he is not aware of any case in the Swiss media in which a person affected had lodged a complaint against journalists. The only remotely comparable cases concerned a report on a case of online voting fraud in the canton of Geneva.

It involved a TV journalist who received two sets of ballot papers after moving house, enabling him to cast an electronic vote twice. The journalist, Joël Boissard, put this to the test and voted twice. It worked. His vote counted double. A fiasco for the e-voting system. The experiment led to a TV report. But also prompted the Attorney General to press criminal charges.

Joël Boissard took the case to the Federal Criminal Court. Without success. On 4 April last year, the Federal Criminal Court in Bellinzona upheld the ruling of the lower court. Boissard was given a suspended sentence of two days’ pay in lieu of jail time plus a fine of 400 francs for electoral fraud. Boissard had “hacked” our election system; although he had not penetrated the system, his two votes counted as manipulation. For our supreme court, vote rigging took precedence over the public relevance of reporting on it.

For data privacy activists, too, the case is clear-cut. As the law stands at present, it is basically not possible for Swiss journalists to engage in hacking without rendering themselves liable to prosecution. They have to conduct their hacking, for example, in countries such as Great Britain — there journalists can invoke the defence of public interest. Or, like the police in Freiburg, work with security experts. They carry out the hack. And the journalists invoke their right to protect their sources in order to conceal the identity of the security expert. There must not be any question of incitement, of course. And in-depth technological know-how helps journalists to understand what’s possible and how they can in any event conceal the identity of the security expert. The greater the journalist’s technological understanding, the better.

The second anecdote I mentioned earlier is about physicians and their mistrust in the black boxes of computer science. Indeed, we’re increasingly surrounded by algorithms that promise us everything under the sun: they can diagnose diseases, or in the case of journalists will apparently soon be able to write articles, or at the press of a button take complex unstructured data to build structured databases and also create links between documents.

But just as physicians mistrust these miracle tools, journalists should do so too. If they don’t understand what these machines do, then journalists and readers will be at the mercy of the machines. But journalists can only report on the algorithms if they understand how they work.

When most people talk about artificial intelligence — or AI, as it is often abbreviated to — they usually mean nothing more than statistics. In general, AI only amounts to estimates of how likely a given scenario is.

So, if journalists understand how algorithms work, they can kill two birds with one stone. They can create their own infrastructure to automate certain tedious aspects of work and generate new ideas for articles. And at the same time they can tap into a new field of reporting: algorithmic bias. Algorithms are developed by humans, and they are fed with data that likewise come from humans. And humans are biased in one way or another.

What do I mean by this? Take a Google search, for example: if you google «professional haircut», the search results will be the same wherever you are in the world; and they will generally show white men with a magnificent head of hair.

But if you then google «unprofessional haircut», you will be presented with a host of images showing black women. Do white men have more professional haircuts then black women? This for me is a prime example of how algorithms often confirm and reinforce human prejudices.

The third anecdote I mentioned earlier was about internet security and encryption. The example of the whistleblower who could be identified by the «fingerprint» of her printer shows that the transparency created by the internet is also palpable outside the internet. Everything becomes traceable and verifiable.

At the same time, there is a discernible countertrend. Innovative encryption techniques are leading to the rise of new trading platforms and currencies. I’m talking of the blockchain revolution, for example. Blockchain is the technology that forms the basis of the digital currencies that have been stealing the headlines in recent times.

I wouldn’t claim to understand how blockchain technology works down to the last detail. It might best be compared to a gigantic logbook that is copied on thousands upon thousands of computers. Each time a transaction is made using bitcoin, this transaction remains invalid until it has been entered in at least half of all logbooks.

What is fundamentally new about this is that it allows humans to interconnect without an intermediary — to exchange goods and money without having to disclose their identity. For many people, blockchain is therefore seen as the technology that will complete what the internet started. Blockchain finally does away with the intermediary. You don’t need banks or retailers between the customer and the producer. You can connect with each other directly. And also anonymously. It remains to be seen whether it actually works. The internet has also moved in directions that we hardly imagined it would.

But if journalists want to understand blockchain and comprehend the transformative power of the technology, it will help if they venture into the world of computer sciences and encryption. I don’t believe it’s enough to be able to operate the latest functions on your iPhone. Or to be an especially good Facebook user. Or to send particularly witty tweets. Journalists need to get to grips with the basic rudiments of these technologies that are changing our lives and our media world.

And the basic rudiments are programming languages. I’m not saying that all journalists must become programmers. Just because we all write doesn’t mean we’re all novelists. But still, a lot of people in the modern world wouldn’t get very far if they couldn’t write. I think you could say something similar today about digitisation: there are few jobs that would not benefit if those who perform the jobs did not have at least a few basic programming skills. Journalism with its many generalists is perhaps one of the occupations that could benefit most of all.

When I say that journalists should learn programming skills, I’m confronted by a lot of head-shaking by colleagues. Learn programming? As if we didn’t have enough to do!

But I see a lot of parallels between journalists, especially in investigative journalism, and programmers.

Investigative journalists are just as nerdy as programmers. They often sink their teeth into a subject area they are unable to let go of until they thoroughly understand the story in all its detail. Until the story they are telling really makes sense.

Any journalists who undertake rigorous research are only really satisfied when they understand each and every step in a story. Programmers are also just as tenacious in their work. They tinker and potter on their computer code for as long as it takes before it works.

Both programmers and investigative journalists also have to struggle with a stereotype that is long since outdated. Namely, the idea that they spend most of their working on their stories or computer codes alone and often not speaking to anyone for days. This is no longer true of the investigative journalist. This was only recently demonstrated with the research carried out on the Paradise Papers. The scope and impact of this research would never have been possible if the researchers had not been networked and constantly sharing information.

And this is also precisely how the programmer also works. They are constantly sharing ideas and information with each other on their computer codes.

I would recommend everyone to take a look at the online platform Github. Github is the largest collection point for computer code and open source software on the internet. The platform enables thousands of developers to work on the same projects asynchronously. The source code of a machine-learning library called scikit-learn, for example, can be found in approximately every second algorithm of artificial intelligence. The entire code was crated on Github, where it is freely available for viewing. More than a hundred programmers from all corners of the world worked on this software package.

A further feature that programmers and journalists have in common is the working method, namely documentation. Good and detailed documentation is just as important for the investigative journalist as it is for the programmer. Because both often work on very highly complex, exclusive materials, it is important to formulate every step — in most cases chronologically. Unless everything is collected in writing and well-organized, it will be impossible to understand later on how a code came about — or how research results can be simply and correctly communicated.

Research is creative. And very much driven by the individual concerned. I don’t know any research-based colleague whose working method can really be compared with another. Of course, there are manuals on how to highlight the most important information from a source in an interview. But basically everyone has his or her own research technique.

The same goes for programmers. It‘s fascinating to see how different people find very different approaches to solving programming problems. Even among people who have only just learned a few basic commands of a programming language.

And of course there are also differences between journalists and programmers. Perhaps the biggest difference is in the outcome. When a programmer’s code does not work for a programming problem, then there is no result. The same does not apply to the journalist’s work. Here there is always a research outcome. Whether it is good and suitable for a story is another question.

Anyone who learns programming quickly forgets that there are people who don’t understand how coding works. It’s a little like learning to ride a bike. For cyclists, it’s difficult to understand how someone cannot ride a bike. This leads to a certain divide between programmers and journalists. And it’s up to journalists to acquire some basic programming skills in order to bridge this gap — and to understand how coding can be used not only for research, but also for journalistic forms and products.

Programming journalists, for example, can create their own simple customised programs. With a few lines of code, you can track a dictator’s aircraft when it enters Swiss airspace; or in a matter of seconds, search all Swiss court rulings ever published on the internet for certain patterns using a cheap laptop; or build a small robot that tells you when an entrepreneur drives his next company to bankruptcy.

When things get more complex — for example, when you want to investigate which politicians swing to the right or the left on which issues — you know which experts to turn to in order to come up with consolidated answers.

To return to the cycling analogy for a moment: journalists don’t have to plan for a Tour de France; they can make headway quickly enough with basic cycling skills. But in this digital age, journalists today should take the time to learn how to cycle. And they should also be provided with the time by their employers.

In short: for a publisher, there is no better investment in journalism than to give journalists the time and opportunity to train in the art of programming. The results are threefold: firstly, sound journalisms for which readers are willing to pay; secondly, greater efficiency, because the journalist is able to automate very specific, repetitive work; and thirdly, the journalist also develops structured data and ideas along the way for new journalistic forms that can exploit the opportunities of new media and platforms.

--

--