Transparency matters when dealing with data (Part 1)

3 min readMay 1, 2019

Algorithms are having an increasingly large effect on how decisions in the real world are being made. As the world becomes more dependent on the decisions that these algorithms produce, the presence of Bias in the results has naturally become the prominent topic of discussion. The case of Amazon’s automated recruitment tool is a perfect example of why bias is such a big deal. In Amazon’s case, they attempted to create a tool that would automatically choose which candidates to recruit for employment. However, the automated tool produced a result that showed a preference for male candidates over female candidates. Obviously, this isn’t an acceptable outcome, nor the one that Amazon intended. The bias they encountered in their algorithm is one of many types of bias that is endemic across applications and industries. Attempting to control for bias, using the standard techniques, does not produce the optimal and unbiased outcomes that companies, like Amazon, are looking for. To achieve unbiased decision-making with algorithms, an alternative approach must be taken.

Typically, algorithmic decision making engines are based on supervised learning and trained on data from the past. Therefore, any biases reflected in the data will also appear in the decision making process. Historical data is filled with biases, so naturally, the decision making process will also be biased. To avoid a biased outcome, the biases have to be identified and a way to mitigate them must be derived. Amazon’s case shows how challenging this task can be: Even after removing gender-related inputs, the decision engine was still showing gender bias. The AI had figured out the gender through correlated variables and continued to discriminate against women!

Amazon, Google, and Facebook’s woes fighting bias may be taking up all the headlines in the news, but these same issues are being faced by companies everywhere, many of which don’t even know that their algorithms are producing biased results. Many methods, such as the resampling of training data or creating different decision thresholds for different classes, have all been tried as a fix, but none have been successful in producing the desired outcomes needed by businesses and the regulators who oversee them. A relatively new area of academic research in machine learning, called probabilistic logic, promises to solve this problem though. The probabilistic logic allows for an unprecedented level of transparency and control over the AI, both of which are requisite for any attempt to properly identify and address bias in the algorithmic decision making process. This probabilistic logic has even been suggested as an underlying technique for what is referred to in academic circles as The Master Algorithm, a modern synonym for Strong AI.

So how could Amazon have used probabilistic logic in their situation? If it can be used to adjust for bias in Amazon’s case, what other use cases and examples of bias can it be used for? In my next post, I’ll be focusing on answering these questions by explaining how our company, Stratyfy, has been forging a path forward in probabilistic machine learning where we don’t let biased data dictate the outcomes. Stay tuned for the details on our solution to identifying and controlling for bias to produce fair outcomes.

Transparency matters when dealing with data (Part 1)

Written by Dmitry Lesnik