Whitening the black box

Manuel Morán Peláez
Apr 29 · 5 min read
The Antikythera mechanism is an Ancient Greek analogue computer used for calendar and astrological purposes decades in advance. (source)

Building a business predictive tool

At Geoblink, one of the modules we have included in our app is the Sales Forecast. This module is intended to help retailers in the network expansion process.

  • The features that drive that prediction and their positive/negative impacts.
Prediction of a given location
Impact of drivers
  • … defined with features that are business-aligned.
  • A methodology to get the impact of those features.

Feature impact

Before we take a deep dive into the technical details of what we call the explainer component (or simply the Explainer), we have first to clarify what we understand by an interpretation of a prediction.

  • It is model agnostic and thus, it can be used for any model regardless its complexity.
  • The model used locally is a linear regression, which is easy to interpret and provides a decomposition of the impact of the drivers measured as a magnitude and a sign, so we can know which driver has impacted more and if it has impacted positively or negatively to the prediction.
  1. The model is used to predict all these new points (including the original one).
  2. A linear regression is fitted with the perturbations and the predictions as outputs.
  3. For each variable we multiply the coefficient obtained by the linear model and the value of that variable for the original point to get the impact.

Conclusions

It is very important to understand that Data Analytics is not only about super accurate models and complex optimization approaches to get them. We have to understand that we are using a data-driven approach to solve a business problem and, for that reason, we have to focus on the insights we can take from the data or the model we are building. Sometimes, a simple data exploration can solve the problem. In fact, we can think of many more complex approaches to attack the problem of interpretability, but a simple local regression is enough and powerful.

Geoblink Tech blog

Tech&Data blog from the team powering the Geoblink systems. We are the engineers, developers, data scientists, mathematicians and physicists trying to build the best Location Intelligence tool out there.

Manuel Morán Peláez

Written by

Mathematician and Computer Scientist interested in Advanced Data Analytics and Machine Learning. Currently working as Data Scientist at Geoblink.

Geoblink Tech blog

Tech&Data blog from the team powering the Geoblink systems. We are the engineers, developers, data scientists, mathematicians and physicists trying to build the best Location Intelligence tool out there.