Magical “Algorithms” Are Just Irresponsible Feedback Loops

J
Software Of The Absurd
6 min readMar 3, 2018
Big Red by Alexander Calder. Famous among computer scientists as the cover of Algorithms 3rd Ed. by Cormen, Leiserson, Rivest, and Stein, or “CLRS”.

Since 2016, “algorithms” has become a by-word for the arcane black magic of companies with predictive software, the esoteric lifeblood of all-consuming artificial intelligence. Modern reporters see algorithms as a sort of hermetic knowledge, whose utterly brilliant initiates and adepts are kept separate from the rest of us through the cosmic knowledge of their secret. This has made algorithms a favorite target of people looking for a scapegoat (why not blame what you view as magic? It worked in 1692!) for nearly everything.

And I agree — algorithms are to blame for lots of things. But not because they’re sophisticated wizardry and technical genius, but because they’re utterly basic statistical processes applied in a pervasive and unashamedly controlling manner. These spooky “algorithms” are causing a problem because they’re stupid, not smart.

Let’s for a moment dive into some math. I promise this will not be that hard. Recall that on a coordinate plane, a point is a pair of two values, (x, y). An example of a point is (1, 3). These numbers are called the coordinates. Now this point exists in two dimensions, because it defines a unique spot in the two-dimensional coordinate plane. But what if we wanted to define a point in space, as opposed to a point on a flat surface? We could use a 3-dimensional point, like (1, 3, 10). If we wanted to define a point in an arbitrary number of dimensions (don’t try to think about what this would physically look like!), we just need a list of numbers as long as the number of dimensions. We call this list a vector.

Now we can step aside to a different topic. Suppose I have a bunch of news articles and I want to describe them in a succinct, consistent way for someone who didn’t want to read them. I want to make sure that the way the articles are described makes them easy to compare, so a prospective reader of my digest can pick out which ones they want to look at in full. One way to approach this is to pick out a bunch of attributes of these articles and give each article a score from 1 to 10 for each attribute representing how well that attribute describes that article. For example: suppose my attributes are “is about cats”, “is about animals”, “is about cars”, and “length”. In this system, a 200 word article about cats up for adoption may get ratings like (10, 9, 1, 1), since it is very much about cats and animals, but has nothing to do with cars and is very short. Likewise, a 5000-word review of a new SUV which talks in one paragraph about some dog-friendly features may get a score more like (1, 3, 10, 8). Notice the similarity in the lists of numbers I’m using as scores with the vectors I described above.

Now we can describe the magical “algorithm”.

  1. Define a lot of attributes, now called features, to describe articles. This number of features is n.
  2. Read lots of articles and give each one a ranking for each feature. Store these vectors somewhere — together they make up a set called the test data.
  3. Wait for a person to read some articles (these are called the training data), then take the features of those articles and fit a curve to them using a type of mathematical formula called a regression. A commonly used type is called logistic regression. There are other, somewhat more sophisticated formulae to use here, but people rarely pick one with consideration.
  4. Find the vector in the training data that is closest to the curve. I will skip the math here, but you can do this by minimizing the Euclidean distance, which is a technique you can learn in a high-school calculus textbook, or college level multivariate calculus for the absolute hardest cases. Once you’ve found this vector, recommend the corresponding article to the person. If they read the article, it becomes part of the training data.

This is the magic of Facebook, Twitter, and that company I keep getting ads for that “made an algorithm for wine.” Ultimately all of us are most likely having our echo chambers shaped by a variation of this mathematical process.

In here, no-one can hear you downvote.

When this approach and those like it were first invented, it was a revelation. It’s sparked decades of progress in computer science research fields. I don’t want to take anything away from how clever, how useful machine learning is.

But the incredible ease of building these systems now has made it easy to deploy them irresponsibly. It’s easy to see this done in systems like Netflix and Hulu. When you first open an account, the recommendations are just taken from whatever’s popular, because there is no training data on you specifically. But watch one show, say Broadchurch, and instantly you’re recommended nothing but British cop dramas. Then the more British cop dramas you watch, the less likely you are to ever be recommended something else, because you are tightening down the fit of that curve. The more perfectly the curve fits the data, the less you will be recommended things that are far away from it.

The same happens with political posts on Facebook. Read one Breitbart article, and if you don’t have a long and well-established training data set on articles you like, the very high scores for “positive sentiment of President Trump”, “discussion of crime by black people”, and “negative sentiment of immigrants” can dominate the model fit and bend your apparent preferences towards the alt-right.

The key issue here is in step number 4. When you read an article, even one that was recommended to you, that article becomes part of the training data and is incorporated into the fit of your model. When this is conjoined with the fact that recommended articles are sprayed liberally across your Facebook feed, as well as your ads, a single outlier in an otherwise stable data stream can suddenly come to dominate a model, turning the true data into the “outliers”. At this point, the echo chamber is solidified, and it has no doors.

I am not equipped to discuss the technical solutions to this problem, at least not nearly as equipped as many of the people, the machine learning PhD graduates and students, who are behind the real life implementations of these systems. But I want to find the solution. I want to be able to say “we can categorically rank an article’s ‘trustworthiness’ from 1 to 10 and use that in our feature vector!” And I think that’s really where the problem is — people have had, until very recently, an extremely lukewarm attitude about solving this problem. It doesn’t make money; it’s not a sexy, high-profile research area; it won’t get me published; it’ll just generate controversy; management doesn’t like it; I could lose my job. It just wasn’t pressing enough to warrant attention.

A lot of people have started to step up and tackle the realities of solving this problem, which is great. But hearing Mark Zuckerberg bray about the beautiful perfection of the open, objective internet (which is not that open or objective) as well as the glorious platform for friendship and love that his team built, I can’t help but feel totally incredulous at his fake naïveté. The notion that foreign spies tampering with another country’s electoral process via Facebook is an “unavoidable consequence” of the totally infallible algorithmic magic resonates harmonically with Wayne LaPierre’s deluded assertion that having your kids die a violent death is an unavoidable risk of being a freedom-loving American.

Algorithms aren’t that infallible. If Zuck doesn’t open his eyes to that reality, someone will replace him. Or he’ll replace us. I hope it’s the former.

--

--