Machine Learning: A one-size-fits-all Data Science solution?

Published in

Predictive Analytics Group

9 min readApr 17, 2020

In an era where advanced analytics, artificial intelligence and data science take centre-stage in boardroom discussions, business decision makers are on the hunt for technologies that offer an edge that will affirm their decisions, identify risks early, and promise a clear path to growth.

No industry has been immune to the sales pitches and slide decks that promote optimisation, AI and Machine learning, with the promise of unprecedented efficiency gains, endless profits and risk-free decision making. But is this all just a mirage? An unattainable goal? Or can businesses really access the most advanced technologies to help them today?

In this article, we want to revisit the data science continuum in broad terms, and contextualise the field of machine learning as one (of many) useful tools to support the path to data-driven decision making.

For the team at Predictive Analytics Group, we are the first to agree that AI and ML has, and will continue to provide businesses with the tools and frameworks required to solve some of the most complex problems they face in their business, and help them to provide answers to questions that they didn’t even think about asking yet. But notice the keywords, ‘tools’ and ‘frameworks’. We think that any business leader who is about to embark on an all-encompassing data science strategy need to curb their enthusiasm for a second, and start with the basics.

So what is data science then?

Data Science means different things to different people, but put simply, it describes a field that applies a combination of data, mathematics and domain knowledge to solve real world problems.

A data scientist is a practitioner of these skills. Matthew Mayo, Data Scientist and Deputy Editor of KDNuggets, argues:

The coveted Data Science venn diagram. Credit: Data Science for the C-Suite, Digital Living Press, 2015.

“When I hear the term data scientist, I tend to think of the unicorn, and all that it entails, and then remember that they don’t exist, and that actual data scientists play many diverse roles in organisations, with varying levels of business, technical, interpersonal, communication, and domain skills.”

… and machine learning?

To the uninitiated, Machine Learning is an umbrella term describes a set of algorithms and frameworks that are designed to “learn” automatically and improve based on experience, without being specifically programmed to do so. Put simply, a machine learning algorithm does so by taking in some data, learns something about it (like for example, a relationship between income vs. house price), and makes a prediction about this relationship. We will save the deep-dive into the types of learning algorithms and their applications for another time, but put simply, machine learning can help us to undertake:

Classification Learning — Where we seek to predict the class or category a data point might belong to. For example, does an object in this image resemble a cat or a dog?
Regression Learning — Where we seek to examine and quantify the relationship between one variable and one or many others. For example, what is the relationship between airline ticket prices and the oil price?

Advancements in machine learning have paved the way for some of the greatest innovations of our time, from driver-less cars, to automated video surveillance and has helped to open new doors in medicine and pharmaceutical industries.

This sounds all really cool, but even in 2020, it seems like everyone wants to build access AI and ML but there is very little information on where to start.

PAG’s Data Science for the “Real World” Guide

At Predictive Analytics Group, we recommend evaluating your organisation’s key KPIs, data, inefficiencies, strengths, profit margins — whatever measures that really matter to the organisation, from the inside-out, to determine some kind of baseline to work against. This what we might call Step 0, where we simply want to establish the status quo in quantitative terms where possible, and to refer back to it in future as prediction and optimisation solutions are implemented.

From here, we follow a simple six-step process to help our clients with their data science strategy.

The important thing to note is that, at least in the early stages, the ordering of the steps is essential in that each step heavily depends on the outcome of its previous step.

As time goes on, the ordering might not matter as much because one of the key objectives to implementing a data science strategy is to develop an automated feedback loop of information. That’s when the really creative side of data science comes into play, facilitating the move from Predictive Analytics to Prescriptive Analytics. But until then, try to follow this process:

Establishing testable hypotheses — I.e. ask the right questions about your business or organisation. As an example, a company with a subscription-based business model is facing a steady increase in subscription cancellations. They might ask themselves, “What factors lead to our customers losing interest in our product, and how long does it take for them to leave us?”. In practice, we have found that many decision makers don’t stop to ask themselves why they want to build this capability in this organisation and skip this step entirely.
Collecting data (true random samples) — Based on the hypotheses you have outlined in Step 1, what data do you need to test it? In the example outlined in Step 1, what information do we have about our customers to help us answer this question?
Examining the underlying properties of the data — Undertake what we call “exploratory analysis” to learn more about the data collected in step 2. This can come in many forms, and any good analytics package can help you create visualisations charts to better understand your data. As another example, say you’re the CEO of a retail chain and you want to see if there is a relationship between store traffic and good weather. Well, if we obtained the right data, we might want to use a simple scatter-plot to see what the relationship between weather and foot traffic looks like.
Fitting appropriate models / algorithms to the data- Since we know a little more about our data thanks to Step 3, we can now put the data into a range of statistics and machine learning models to see if they can assist us in predicting the relationship.
Drawing meaningful interpretations (using robust diagnostics)- So we have developed and tuned a robust model in Step 4, but is the model ‘over-fitting’ the data? Under what circumstances does the model fall down? What is the overall performance of the model? In the case of our example from Step 3, what variables have a positive impact on foot traffic, and what variables have a negative impact? Interrogating the model is just as important as interrogating the data itself, if we want to trust our model to guide our decision making in future.
Reporting the research findings / deploying the solution- We have arrived at a prototypical model that we trust enough to guide part of the decision-making process. If a simple report will do, then the entire project, like any scientific experiment, needs to be documented for future reference. If you want to deploy this solution into the “real world” so you can use the model again and again, you need the infrastructure to do so.

This is the same process we have used for a wide range of use cases, from emergency call demand forecasting, rostering optimisation, retail inventory modelling, among many others.

An example of **Step 3: Exploratory Analysis**. Using a scatterplot to examine Customer Satisfaction vs. Credit Score. Some simple analysis might find that not only is there a correlation, but we may find that the observations (customers) fall into distinct categories. Source: AutoStat®

Note on point #4 — this is where you would start with Machine Learning (…that is, if machine learning is the right tool for the job). We have found that when businesses start out on their data science journey are all too excited about ML and AI that they dive right in to Step 4 without any consideration for the first three!

The problem with this approach is many-fold, but we have picked out a few to give some context:

Machine Learning may not be the right tool for the job. Some ML algorithms perform better or worse on different datasets and in different problem settings. For example, an Artificial Neural Network might be much better under some scenarios, but in other cases you might be off using a simple Linear Regression. Further, placing faith in the output of a machine learning model without diligent consideration of the model’s inputs might take your analysis completely off-track without even realising it.
By skipping Step 1, you might not even be asking the right question! Sure, you have built a Gradient Boosting model with 92% accuracy, but does it help you to prove or disprove your initial hypothesis? Although there is an area of Machine Learning called Unsupervised Learning that does just this (which we will explore in a later article), in the beginning of a data science journey you want to be able to solve complex problems with tangible insights, and quickly. The best way to do this is to compile some carefully considered hypotheses about your business, your customers or operational processes, and seek to answer them explicitly.
Some of the best insights you could gain about your data can be uncovered in Step 3. Like our example earlier — say you’re a retailer, and you are were confident that “the hotter the weather, the more sales we have”. Well, we could model that quite easily with the right data. But say that assertion is only half-right, and when it hits about 30°C / 86°F, foot traffic starts to drop off. We could have plotted the data on a scatter-plot and have identified this non-linear relationship before modelling, we could have accounted for this to generate more robust predictions. This can be done in a variety of ways (such as using splines in the case of statistical modelling, or simple feature engineering), and it could have helped us to create a smarter learning machine.
Data Science strategies with an overemphasis on Step 4 tend to become too expensive, too invasive to the organisation, and risk deviating from the strategy’s original mission entirely. In our experience, taking an organisation down the path of a fully-fledged data-driven journey is an iterative process that starts with asking the right questions, and ends with better insights and information to guide decision-making for the long term.

So, is Machine Learning the one-size-fits-all solution to all of your organisational woes and goals?

Not quite, but as part of a carefully thought-out data science strategy, it can be an integral tool used to build tangible, repeatable solutions to some of the biggest quantitative challenges you face in your organisation. It all comes down to being able to ask the right questions about your organisation; to interrogate your data objectively; and to apply the right tools to help the broader decision-making process.

Some key takeaways:

Keep it simple to start with, and pick the low-hanging fruit first. Don’t assume you need ML for this, or Deep Learning for that. Just start with a few basic questions that you want to ask about your business, your customers, processes or staff, and think about how you might be able to answer them with data. Machine Learning might help, but it is only one piece of the analytics puzzle.
Build data science capabilities iteratively. It takes time to build up to a fully-fledged, fully-automated data science capability in-house (and can be quite expensive too). Keep the strategy focused and outcomes-oriented, and build on this momentum as you go.
Don’t underestimate the power of exploratory analysis. Simple data analysis and visualisation has a huge role to play in the data science continuum, and the outcome of your analysis can make or break a prediction model. Compared to machines, us humans are still pretty good at pattern recognition, so visualising the data can go a long way towards uncovering relationships and insights.

Predictive Analytics Group is proud to offer free trials to its groundbreaking data science platform, AutoStat®.

Thanks to AutoStat®, business leaders can finally address the challenges they face in their organisations with quantitative solutions and access cutting-edge data science capabilities without the long-term RoI.

Build the predictive enterprise at speed and scale, and integrate the entire data science continuum with AutoStat®’s unified platform.

Machine Learning: A one-size-fits-all Data Science solution?

So what is data science then?

… and machine learning?

PAG’s Data Science for the “Real World” Guide

Written by Ryan Gallagher