How to prove the value of machine learning and avoid worthless models
Humans don’t like risk and, as with any technology investment, a Machine Learning solution comes with its own set of risks. Will it predict the right thing, will it be accurate, will it be useable? In this post I’m going to take you through how we at Datasparq reduce risk through Proof of Value.
Our machine learning solutions are designed to allow people to perform beyond their potential, whether that’s making more accurate sales forecasts, creating effective retention strategies for customers, or focusing business development effort on leads likely to convert to wins. Machine learning solutions give people the benefit of the predictive power hidden away in data.
Machine learning has proven time and again to be effective at predicting the future by learning from the past, whether that’s identifying potential skin cancer, or helping you decide what to watch next on TV. That’s not entirely surprising as these solutions are fundamentally doing what humans have evolved to do — observe, learn and predict, just with the advantage of being able to process vast amounts more data.
But just because there’s evidence something has worked before doesn’t necessarily mean it’s going to work in every situation.
This is why we start all of our machine learning projects with a Prove phase. It’s a short, targeted mission to reduce ambiguity and test assumptions to minimise risk. Over the course of a few weeks, we validate or test the following:
- That there is a clear target variable for the prediction - we know what we’re predicting.
- That there is a viable means for measuring value from the prediction
- That there is predictive power in the data (signal in the noise)
1. Clear target variable for the prediction
The target variable is the variable in a dataset whose value is to be predicted using other variables. It’s pretty fundamental to any machine learning solution, and picking the wrong target variable can lead to significant wasted effort down the line.
Our goal here is to define which fields in the data will be used to form the ‘target variable’. This is an iterative process where we identify a target variable, then validate how a prediction of this target variable will be beneficial. This in turn raises new questions which allows us to loop back and idenfity a better target variable.
It sounds like it should be easy. Let’s look at demand forecasting as an example. Imagine you sell jeans in a high street store. In order to effectively manage your stock levels you need to make an estimate in advance of how many you’re going to sell. So a machine learning solution just needs to predict your sales. Simple, right?
In reality, defining the scope of prediction tends to be a significant chunk of work that requires us to deep dive into an organisation’s processes and value chain.
- What product or service are we predicting demand for? Is it one product, or hundreds, each with different demand patterns?
- For what period are we predicting demand — next week, month, year? Or maybe combinations of these, or maybe different products would benefit from different levels of prediction?
- What is ‘demand’? Is it a placed order? A product taken off a shelf? What if that order was later cancelled, returned, or not paid for?
- Do we need to care about who is making the demand — do we need to predict demand from a specific customer or location, or a demand that will be fulfilled by a specific fulfilment centre?
- How far in advance would the prediction be required in order to be valuable to the business? Predicting tomorrow’s demand today is much easier than predicting for a day next month.
Some of these questions (the known unknowns) will have been answered prior to the Prove phase as part of our SparQshop™, others arise from deeper exploration of the data (the unknown unknowns!). All need to be answered if we’re to precisely define the target variable.
2. Measuring value
In order to optimise a model we need to know what metrics we’re using to measure performance. Understanding the key performance metric gets to the heart of some of the difficult questions around model optimisation, for example:
- In a regression problem (where we are predicting a number, like product sales), what’s the relative value in over predicting vs under predicting? Underpredicting by 10% might have a significantly bigger cost impact than overpredicting by 10% (you can’t sell what you don’t have, you may though be able to store some excess stock for selling later).
- In a classification problem (where we are predicting a binary outcome, like customer churn), what’s the impact of a true positive vs a false positive? If we predict a customer to a subscription service is going to leave (churn), they may get offered a discount to stay. If they take up this discounted offer that’s great if they would have otherwise left. However, if it was a false positive prediction and they were actually never going to leave, it’s a cost to the business.
Then there may be other factors specific to the problem being solved. In the demand forecasting example, is it more important to accurately predict demand for some products over others (maybe some have a shorter shelf life, or are more expensive to keep in stock)?
The goal here is to come up with a metric to measure model performance which relates directly to the value being created for the business. For example, for customer churn, that metric may be:
Profit ($) = R*TP — C*FP
Where R is the revenue from a true positive, C is the cost of a false positive, and TP & FP are the true positive and false positive rates.
3. Predictive power in the data
This is where we create a baseline machine learning model and test the predictive power inherent in the data. This starts with some basic evaluation of the data — specifically is there enough historical data to be able to train and test a model?
Next we need to know where to start with the data. We’ll typically run a hypothesis workshop with people in your organisation who understand the data and the target variable. In the case of demand forecasting this would involve those close to the buying/selling operations to understand, from their experience, what factors affect demand. For example, we may learn that they see sales go up in summer, or are higher at shops in certain locations. From these hypotheses, we extract features from the data, for example “store location”, or “average temperature over the previous 7 days”.
From here we can do some univariate analysis. Are there features which show predictive power on their own? Correlation ≠ causation so success here isn’t proof of signal in the noise, but can help steer focus to the most useful areas of the data.
Once we have a handful of features we believe to be predictive, we can put them into a baseline model and test it. We measure the performance of the model by testing it against data it hasn’t seen. This is both out-of-time (given complete data up to 2018, can it predict what happened in 2019) and out-of-sample (given data for all time but removing data for certain shops, or products or price points at random, can it predict the gaps?).
The Prove phase isn’t designed to deliver a fully optimised model, our goal here is to demonstrate evidence that there is predictive power in the data and to get a sense of how much value might be gained through an optimised model. It’s not an exact science, but working closely with the subject matter experts gives the model the best chance of succeeding early on and it’s usually possible to prove value in this first phase, or at least get a good sense of whether it’s worth pursuing.
We’re good to go!
…not quite. The above activities bring clarity and definition to the model and it’s potential value but even the most accurate demand forecast is of absolutely no value unless it’s used.
The final piece we look at in the prove phase is how the solution will be used in operations. Are the stakeholders bought in to the idea of having a machine help with their jobs? Is is feasible to integrate the predictions with the systems/tools they use every day? What changes might be needed to operational processes once the predictions are live?
Even if these questions aren’t fully answered at this stage, having the discussion early on is another way to reduce risk.
If you think there may be an opportunity for machine learning to deliver value in your organisation and you’d like to chat about a low risk way to get started and prove the value, please get in touch at firstname.lastname@example.org.