The “Black Box” of Machine Learning

Published in

MassArt Innovation

5 min readJan 4, 2018

We live now in a world where everything is interconnected. The systems are becoming smarter and designing these systems is the next challenge. The problem that we face is that business is treating data in an opaque way. They prefer to not ask for the user’s permission to use it. The user doesn’t know what he is being “targeted” for. It is a black box, and the user doesn’t know that he is inside it.

The emerging techniques of data processing and knowledge extraction spark a lot of new questions. Some of those questions revolve around ethics, but my focus here will be on a perhaps unavoidable bias that is inherent to these systems. First let’s look at how machines learn.

What is machine learning? Let’s make it simple. Picture a chair.

The chair you imagine is, most likely, very different from my chair. When you imagine your chair, you imagine it based on your understanding of what a chair is; that was your bias speaking. They all have 4 legs, a seat, and a back but the shape and materials are very diverse. This is the image that you now must use to teach a machine what a chair is. You probably will show the machine your model of a chair, and not all the others chairs that exists. Even if you use a training set that happens to have a lot of chairs, it will be a subset of all the chairs in the world ever created. At this moment, you have just introduced a certain amount of bias into the machine.

Machine learning is a process created by humans, with results interpreted by humans and, as such, it is the reflection of the human mind and has its inherent bias.

In traditional programming, the solution to a problem is hand written by a human or a group of humans. With machine learning we let the computer work out the solution to the problems, often by identifying patterns and structures in the data we feed it with. One often thinks this must be unbiased because it has no human intervention. It is just the machine working its way towards a solution. This is, in fact, inaccurate. They cannot be separated, the machine is learning with the data we feed it with, which comes from humans. Data is a replication of a model, and like all models, they often generalize to allow for simplicity and can induce mistakes. We can thus, influence the results by our own beliefs.

Los Angeles started using the PredPol system, to “predict” which area is more likely to have burglaries based on past data from police reports. This way they can put the police present in these areas most likely to have break-ins. The result ? Crime dropped by 25%. Oakland police, on the other hand, chose not to implement this program as the city perceived this as racial profiling.

Decisions based on data can be misleading. As results are analyzed, we need to be aware of the bias. We can encounter it in two different moments: first, when we code the algorithms and input the data and second, when we retrieve the outputs.

We have the recent example of Facebook marketing and political campaigns. The problem with data is that the companies that are taking care of this matter are doing it in a very opaque way.

A good example is the congressional hearing of Facebook’s general counsel Colin Stretch on October 31st, 2017.

The average user is not aware of this. Even for the teams working on or with the data is hard for them to detect where is the bias when they are being hit with the overwhelming data.

What are the most common Bias on data:

Confirmation Bias:

When there’s intention to validate your hunch, opinion or assumption. It’s one of the most common, and easy to fall victim to. Your mind is focused on what needs to be done, and you get the numbers that agree with your theory because the data “feels right.”

In the design field, we tend to make our points across the same way. I need to find some information, data values about this point. We found a study, that help us to tell our story. But did we really dig in, to cover all the studies, pros & cons, about this subject?

Inside a company, the data scientists are understood in the business setting as business intelligence. They are the ones that help companies make decisions regarding their business. They are given guidelines that play along with their report — transfer Confirmation Bias — this could end being very damaging to the result.

Selection Bias:

As designers, we use a lot of surveys in our ethnographic research. The way you formulate the question is very important.

We cannot lead the user. Questions like: “does ……….. make your feel frustrated?” need to be avoided. You are putting in the user’s mouth the word “frustrated” something he hasn’t said yet. Reformulating the question: “how does that make you feel?… why is that?”.

Selection bias is when you formulate questions that induce a correlation of your insights.

Also, surveys are sent to a specific group, and then we extrapolate the results for all population. The information is incomplete, inaccurate and with widely varying degrees of quality. Ask yourself: Who is going to take the time, in our days, to give feedback about my survey?

OK, maybe this is my own bias speaking over here, since I have strong feelings about surveys.

Interaction bias:

This is a different type of bias. This is the result of how the user interacts with the algorithms. The perfect example is Facebook, they know the pattern of each user, and they tailor information according to your “taste.”

Latent Bias:

Latent bias is the unethical correlation between gender, race, sexuality, income or another demographic variable.

Look at what happened in 2016 with scores in courtrooms in the USA. They are rating the risk of defendant’s and evaluating the risk level of commenting a crime base on prior records. The problem is that this algorithm is charged with bias factor, discriminating Afro American people.

“Predicting” human behavior is something you cannot do.

We are now starting to build relationships between machine learning and the user. Machine learning was developed until now by data scientists, inside a room isolated from the user. Design companies focus on user experience are trying to enter this “market” and incorporate the design process with machine learning. Being able to bridge this gap between humans and machine learning. And one of the things they need to consider is how is the Human Bias factoring into this. This is an area with many questions that need to be answered.

One that I have is that how can we make this algorithmic “black box” start being transparent from the user’s and the team’s point of views.

The “Black Box” of Machine Learning

Written by Raquel Meneses