Algorithms and black boxes

Published in

Enrique Dans

3 min readMay 29, 2018

Some US banks report problems with their machine learning algorithms, particularly those that decide which customers should be granted loans: as the complexity of the models increases, the interpretability of those models decreases.

Algorithms have provided banks with efficient models that reduce their exposure to unpaid loans, but that are basically a black box that can also generate a range of problems. Hari Gopalkrishnan, technology manager at Bank of America, notes in an article:

“In banking, [w] e’re not fans of lack of transparency and black boxes, where the answer is just ‘yes’ or’ no. We want to understand how the decision is made, so that we can stand behind it and say that we’re not disfavoring someone.”

The problem, which I wrote about a little over a year ago based on a series of conversations with my friends at BigML, a company that employs me as a strategic advisor, is one that many companies will face as machine learning becomes more complex, given the tendency to use the most sophisticated models within our reach without taking into account how to interpret increasingly complex models based on larger and larger amount of data.

Early phases and rapid prototyping of machine learning projects, where a reasonable result is enough, tend to rely on logistic regression or decision trees, which are relatively simple to interpret and do not require lengthy periods of training. But as we move towards the intermediate phases, where we’re looking for optimized and proven results, we tend to evolve toward more complex models with better representation, such as those based on decision forests. And when we reach the final phases, where an algorithm performance proves critical, then we lean towards boosted trees and deepnets or deep learning. The appeal of this progression is evident given that it tends to improve the representation, fit and performance of the model, but the downside is clear: these improvements require longer training times and most importantly, are less susceptible to interpretation. When the model reaches a certain level of complexity, the chances of correlating a result with its input variables are reduced significantly, interpreting the causality is more difficult and demonstrating that a decision has not been taken on the basis of potentially discriminatory criterion potentially creates problems that, given legislation such as the Equal Credit Opportunity Act, aimed at preventing discrimination based on variables such as race, religion, origin, sex, marital status or age, can lead to lawsuits.

The value of machine learning is not in setting increasingly complex models, but in making it easier to use. Businesses are complex processes, despite our efforts over the last 150 years since the Industrial Revolution to apply simple rules. The black box issue is important and requires mechanisms to add transparency and that try to arrive at explanations about the predictions made by algorithmic models, as well as being a restriction that companies must take into account when it comes to scale their machine learning initiatives; a long and complex process in which 90% of the effort is invested in processes such as defining objectives, managing and transforming data and feature engineering, with only the final 10% applied to what we traditionally consider the result: predictions and impact measurement.

The issue for most companies nowadays is not that machine learning doesn’t work, but that they struggle to actually use it. It’s going to be a long, hard process, but a worthwhile one that will differentiate between companies capable of using predictive models and those that are not and are limited to decision-making based on arbitrary issue, intuition or unscientific rules, and that will move from seeing it as a service to a commodity or a utility… for those that have done their homework.

(En español, aquí)

Algorithms and black boxes

Written by Enrique Dans