Algorithms: garbage in, garbage out
In a recent article, Lewis Bush looked into the repercussions of algorithms entering the world of photojournalism. For the many different reasons outlined by Lewis, the idea of algorithmic photojournalism ought to make us shudder.
On a very fundamental level I am deeply worried about all the hype surrounding algorithms. First, a little personal background. From 1995 until 2007, I was deeply involved in the world of programming. For example, during my Ph.D. studies (1995–98) and then again later (2002–2007), I worked in the field of computational cosmology. Computational cosmology centers on the idea of running very extensive computer simulations of the universe. More precisely, the idea is to take a theoretical model (equations, some of them directly related to Einstein’s theory of gravity), to feed it into a computer, and to then have the computer calculate what the universe looks like. Why would anyone do that, given that there is only that one universe we live in (let’s ignore multiverse theories)? The reason is simple: while we have a good understanding of what the main theories look like (gravity etc.), we don’t quite know the exact parameters that are part of the model (such as the precise expansion rate of the universe, the exact fractions of the various types of matter, etc.). Oh, and there is dark matter, and, as was discovered at the end of my Ph.D. time, there is what’s called dark energy (see this article). We don’t know what that stuff is, but we can simulate how it might behave. We can then compare the outcome of a simulation with what we observe: does this (simulation) look like that (our universe)? It usually doesn’t, requiring a bigger and more refined simulation.
The computer programs I worked with, some of which I contributed to (the vastly complex programs running on massive parallel supercomputers that produced the main simulations), some of which I wrote myself (the simpler programs that would study the simulation results), all involved a series of algorithms. To work with these programs, I had to understand how these algorithms worked or come up with them myself. And it is exactly that background that has me so worried about all that talk about algorithms in the world of photography (and beyond). Algorithms are little more than procedures which produce output data from input data, following underlying models. While algorithms are good at crunching enormous amounts of data, they will never produce anything that lies outside the original model. In other words, any algorithm will produce something that lies firmly within the parameters of what you can think of. Of course, there are algorithms that can become “smarter” with time (for example, artificial neural networks), but still, what you’ll get is essentially what you put in.
Against this background, I’m not in the least surprised that, for example, Microsoft’s Twitter bot Tay turned into a Hitler praising conspiracy theorist in no time: it did exactly what it was programmed to do, namely to learn from what it was being offered. In much the same fashion, any time some algorithms produces something that’s biased, that’s not a bug, that’s simply a feature. These biases enter the algorithm, through the people who design the models and create the code. Luckily, as humans we are becoming more and more aware of how such biases create unequal playing fields. Algorithms, however, cannot do that. Algorithms cannot question their own foundation.
Just to give just one extreme example, I doubt that any photojournalism or news algorithm available today would have been able to process the terrorist attacks on September 11th, 2001. For the vast majority of human beings, that event was the unthinkable. You cannot program the unthinkable. What you can do is to program what you know. So algorithmic journalism might be able to process, as Lewis outlines, stories around earthquakes — but only to the extent that they conform to earlier stories. Any deviation will either simply disappear or cause a system crash.
For sure, there are plenty of journalists who think that showing up and reporting the facts is what it takes to be a journalist. And it is. A lot of those facts do indeed conform to something that happened before. But there always are those events that did not happen before. There also is the fact that while, for example, auto accidents have happened before, they typically happen to different people. There almost is an ethical dimension to having such events get processed by algorithms: do we want to reduce human beings to mere ciphers in our news just so that things can be done a little bit more automatically?
This then leads to what I personally see as the three main problems with algorithms when they’re applied to the kinds of situations described in Lewis’ article:
- Algorithms work along the lines of their underlying models. They are unable to deal with what humans would call the unthinkable, with what is simply not known, yet, with — to use today’s parlance — data that doesn’t exist, yet. Unfortunately, the news are filled with exactly such events. In the context of social media, we already have a term of what happens when the unthinkable is excluded: filter bubble.
- You might see this as a variant of 1, but I think it deserves to be stated on its own: Algorithms work along the lines of their underlying models with all their flaws and biases. As the old computer-science saying goes “garbage in, garbage out.” Or maybe here “biases in, biases out.” In a world that is becoming more aware of how the many biases we operate with severely distort the rights and well being of so many members of our societies, we ought to move towards a bias-free world, not one that cements biases into code.
- Algorithms reduce human beings to data points. As convenient as that might be for some people, I’m not sure that’s a very good basis for quality journalism.
Of course, it is conceivable that at some stage, computers will be so powerful and algorithms will be so clever that there will be artificial intelligence. I don’t want to speculate when that will happen. In light of the various examples given in Lewis’ article, we still got ways to go.
Recommended follow-up reading: Hito Steyerl’s A Sea of Data: Apophenia and Pattern (Mis-)Recognition