Algorithm Transparency

Jonathan Koren
4 min readMay 19, 2016

or Paying Attention To The Man Behind The Curtain

Disclosure: I worked on algorithmic ranking for Facebook Trending when I worked there from 2014 to 2015. I no longer work for Facebook.

Recently Gizmodo reported that Facebook’s Trending feature was powered by people. Which is not entirely true, but if you do a close reading of what all the participants are saying, you’ll see that no one is actually contradicting each other. What happens is that trends are automatically detected on Facebook, and then humans read through the list and then approve some of them to appear in the Trending section. When they do that, they write a little description, pick a picture, pick a representative link, and maybe fix up the trend name. That’s pretty much it. The actual trends you see are algorithmically ranked on probability of a click.

What really made this article blow up was the revelation that humans were in the loop. As someone that has worked on multiple trending news products, and has spent his professional life working on these public-facing algorithmic products, it’s not surprising. In fact, it would have been surprising if wasn’t the case. However, if you don’t know what to look for, I can see why someone would be surprised to learn this. To Facebook’s credit, they’ve chosen to be radically transparent about this. There’s no reason not to be, this is how these products are made. They’re posting internal Quip documents and configuration files.

Many of the reaction stories to Gizmodo report were cries of either being duped or of political bias. The political bias question doesn’t really interest me. This is just the latest episode in controversy that has been going on since before I was born. What I do find interesting, is the idea that the public was duped into thinking that it was just computers doing everything.

There’s a popular belief that somehow if you just just define and automate, and throw enough data a problem Data Mining / Machine Learning / Artificial Intelligence (or whatever we’re calling it this week) will solve all our problems in a perfectly rational and objective manner “all watched over by the Machines of Loving Grace,” as Richard Brautigan put it. Ironically, this belief may actually be more popular with people outside of the AI/ML community, because those that build these systems are forced to confront the many limitations, both technical and human. AI is not sort of magic pixie dust that makes these problems go away. All it does is automate a human’s poorly defined goals with all the assumptions and biases therein.

Sometimes these human assumptions are latent but not intentionally obscured, as in the case of Facebook Trending, but recently these biases are explicit, but actively hidden. To see this, we only have to look at the recent explosion in “artificial assistants”, or more colloquially, “bots”. Operator is people. Facebook M is billed as a robot, but it’s not really. X.ai has humans verifying the transactions. Many bot startups actually are humans actively chatting with you, or helping you perform some task behind the scenes. Think about that. Humans pretending to be robots. (Suck on that Philip K. Dick.) Wasn’t the future supposed to be Rachael Tyrell selling insurance?

The fact is that even if you want have computers do everything, for technical reasons, resource limitations, and product positioning, you may want humans to oversee the algorithms, because computers are incredibly stupid, and getting things wrong can either be merely slightly embarrassing, or have very very bad real-world consequences.

Because these algorithms, these AIs, are influencing information, decisions, and pretty much everything in 21st century society there’s been a move for what’s been termed “algorithmic transparency”. Essentially it’s call for explaining what data is collected and how it’s used.

Unfortunately, no one seems to know what this means, or even what level of transparency is appropriate in which case. Arguably, a mostly believable simple explanation like “because you bought A, we’re recommending B” at a shopping site may be good enough, because it really doesn’t matter if the underlying algorithm is singular value decomposition, or Pearson’s correlation. But is the price I pay fixed, or is the site engaging in algorithmic differential pricing, and if so how is that determined, and can I manipulate my behavior to get a lower price? Good luck in getting that information from a seller.

Some more radical transparency advocates have suggested that the public should know individual model features and coefficients. While a nobel intent, quite frankly this information isn’t always all that useful to know. Coefficients are just numbers that often aren’t meaningful in isolation and can change rapidly. Features — the things that the computer is taking into account when making a decision or prediction — may actually be pretty fine-grained or, in the case deep learning algorithms, not even be known or completely understood by the practitioners.

So what should we do? Clearly, if the public thinks they’re being lied to or manipulated by the secret esoteric cabal of engineers and data scientists, then something is wrong. As a community we do need be more forthcoming about what it is we do, and more specifically what the limitations are. Among ourselves, we’re pretty open with what we do and how we do it. Perhaps it’s time we level with general public as well.

--

--