Journalists holding algorithms accountable

Warning: This blog contains explicit language.

Surge multiplier data from Uber for Washington, DC for the month of February.

When you enter information into a search engine like Google or Bing, what is happening? What is the algorithm doing to come up with the results that you see? What are you not seeing? What are the biases attached to these algorithms?

Nick Diakopoulos and Jennifer A Stark, computational journalists at the University of Maryland, are asking these questions and more to hold algorithms accountable.

Some interesting areas to investigate when it comes to algorithms include:

  1. Is the algorithm fair or is it discriminatory/unfair?
  2. Are there mistakes that deny a service?
  3. Is something censored by the algorithm?
  4. Does the algorithm break a law or a social norm?
  5. Does the algorithm lead to a false prediction?
  6. Is the algorithm a violation of privacy?

In one study, by Diakopoulos, called Sex, Violence, and Autocomplete Algorithms, published in Slate, examined which words Bing and Google censored in their suggested search competition results. As Diakopoulos points out, in 2013 Google’s FAQ used to say — “we exclude a narrow class of search queries related to pornography, violence, hate speech, and copyright infringement”. Diakopoulos tested this by passing 110 words into the Google search API for suggested autocompletes.

Words that are blocked from providing suggested search results. In the left circle is Google. In the right circle is Bing. The middle is where they overlap.
Words that are blocked from providing suggested search results when paired with the word “Child”. In the left circle is Google. In the right circle is Bing. The middle is where they overlap.

What Diakopoulos found was that, Bing only has one word that is blocks from suggested results — ‘homosexual’, while Google blocks quite a few words. However, when the word “child” is added to the search query, both search engines block many additional words.

Diakopoulos points out that the manipulation of search engine results can have a possible impact on the outcomes of elections. To read more on this, check out “The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections” by Robert Epstein and Ronald E. Robertson. (If you don’t have access to the article, here is an abridged blog post by Epstein to give you more of an idea.) In the study, Epstein and Robertson played with the position of candidate results in search returns and looked at how that affected the way people would vote.

Besides the influence of search engines, Stark and Diakopoulos have been looking at how Uber algorithm routes drivers for pick ups. What Stark and Diakopoulos found was that “Uber seems to offer better service in areas with more white people”.

During their research, they found “… a month’s worth of Uber data throughout D.C. suggests an answer: The neighborhoods with better service — defined as those places with consistently lower wait times, the pickup ETA as projected by Uber — are more white.”

To dig into how they did this study, checkout the repo of data and code.

This type of work falls within the realm of computational journalism, which has been taking off in the past few years. There are now departments or programs at the University of Maryland, where Diakopoulos and Stark are, along with Stafford and more. If you are interested in this area, there is a Computation and Journalism Conference coming up in Fall of 2016 at Stafford.

--

--

Jacqueline Kazil
Notes from a Computational Social Scientist

Data science, complexity, networks, rescued pups | @InnovFellows, @ThePSF, @ByteBackDC, @Pyladies, @WomenDataSci, creator of Mesa ABM lib