The art of driving Business Value with Machine Learning

Published in

Intuition Matters

9 min readMar 17, 2022

Digital Transformation is believed to fuel the future of every industry, and hence companies across industries have started to innovate their core business models & fundamentally change the way they do business.

Business Model innovation is different from new product introduction.

As part of digitization, you are not just creating a mobile application, instead you are creating a process to capture data, and feed that back into the business model. Data is not at the periphery of an organization anymore, it has taken the center stage.

Machine Learning — The Evolution

Data based decision making in not a 20th century concept, it has been there for decades, with different terminologies. The image below shows the three dominant booms, which evolved with huge promises but eventually failed in delivering, given a lot of limitations

The past decade witnessed the 3rd boom of AI — which is Machine learning, and the primary trigger to that is the increase in power of computation & reduced cost making it accessible to everyone. Source: Broken Promises & Empty Threats: The Evolution of AI in the USA, 1956–1996

Machine Learning might sound fancy, because it is about providing systems/machines the ability to automatically learn and improve from experience without being explicitly programmed. However, intuitively it is all about generalization, from the data available.
Another perspective of looking at machine learning could be as a set of methods that can automatically detect patterns in data, and then use the uncovered patterns to predict future data, or to perform various kinds of decision making under uncertainty.

ML originates from statistics. Basically, Data , Statistics & computation coming together — gave rise to a bunch of different algorithms / research areas like supervised algo, probabilistic models, game theory etc.

Machine learning is really the art of finding the why behind the what

Intuitively, below are a few important questions to be answered while framing a machine learning problem. The answers to these questions will help frame a solution that can be scaled from POC stage to the production stage.

What should it detect?
How it should detect?
How should the output be?
How to integrate ?

Machine learning is a set of methods that can automatically detect patterns in data, and then use the uncovered patterns to predict future data, or to perform other kinds of decision making under uncertainty (such as planning how to collect more data!).

Intelligence is not about memory, its about generalization.

3 Dominant Paradigms in Machine Learning
There are 3 paradigms in Machine learning (Supervised, Unsupervised, Reinforcement learning) and hundreds of algorithm implementations for each paradigm. However all these are designed to do one or more of the 3Ps [Prediction, Pattern Recognition , Process Automation]

Paradigm 1 : Supervised Learning
Supervised learning is all about allowing the algorithm identify patterns & create a relation between the input & output. Most business use cases across most domains are based on supervised learning. One import aspect here, is the ability of the algorithm to generalize the learnings from past input to output mappings — onto new inputs the algorithm has never seen.
Risk Assessment, Image classification, Fraud Detection, spam filtering, etc are a few real time use cases of Supervised learning.

Every Supervised Learning Algorithm is essentially mapping the input and the output. y=f(x)

Classification & Regression are the two common techniques of supervised learning.
In classification — the fundamental problem is to find pure regions, and there are multiple implementations that greedily, iteratively, or hierarchically try to find such pure regions in the feature space. In simple words — classification is used where the number of outcomes are finite.
Regression is predominantly a statistical method and emphasizes on estimating the relationship between the input and output variables. In other words — regression is used when the outcomes are non-finite.

Paradigm 2 — Unsupervised Learning
Labelled data (mapping between input & output data) , which is critical for Supervised learning, is not only costly to acquire, and also has the risk of relatively little amount of information for the algorithm to generalize.

Unsupervised learning is about discovering unknown patterns from the data. The goal is to discover patterns from the data, without a desired output for each input. Customer Segmentation, Recommender Systems, Anomaly detection, etc. are a few real time use cases of Unsupervised learning.

Unsupervised helps discover the unknown unknowns [I don’t know what I don’t know ] in the data.

Paradigm 3 — Reinforcement learning
About a decade back, a ML Algorithm has made the news about beating an intelligent human in the game of Chess. Have you ever wondered, weather its classification, regression or clustering — where the objective is to predict the next best action from the current state, with an objective to maximize the reward. That's reinforcement learning!

Supervised or unsupervised learning algorithms require data samples — of events post the events occurred. Whereas, reinforcement learning works interacting directly with the environment

Reinforcement learning all about training an agent to take actions(from the action space, within the environment) that maximize the reward & avoid the ones that don’t maximize the reward.

Reinforcement Learning in a nutshell Source

Marcov Decision process is a commonly used framework for solving most RL problems with discrete actions. Check this link, for a detailed overview of MDP in action.
Reinforcement learning has a huge untapped potential in creating game strategies in sports. A few noteworthy examples of RL in sports include Chess, Ping Pong , etc

4 things to every ML model has in common
Every Machine learning problem is also an optimization problem. Different objective functions are optimized in every parametric modelling technique. Below are a few examples.
Clustering — minimizes the distance between a data point and its cluster center
Fisher discriminant — maximizes separation between classes
Decision trees — split a leaf node into the purest possible sub regions,
Perceptron — minimize misclassification error, and the list goes on.

Machine Learning is not magic, its just doing the math.

Understanding the objective function of an algorithm — makes it easier to finetune the algorithm for practical implementations. Breaking every ML algorithm into the following 4 components is all it takes to understand every algorithm intuitively.

1. Intuition
We develop an initial understanding about the problem in hand, and think of approaches to solve the problem, in terms of what are we solving for, how might we derive the metrics, what is being optimized, etc.
Example: In K-Means clustering, we try to group similar points together. When all the points are represented in multi dimensional place, find points that are closer, and group them together, i.e. we want to find the set of points that have a minimum distance between them.

2.Representation /Formulation
We convert the understanding into a mathematical representation, i.e. the Objective Function in terms of data, and see how we can modify it to something simpler, or more solvable. For example, in case of clustering — we are trying to minimize is the sum (squared) of the distance between each data point and the cluster center.
In an objective function there are 4 parts: The unknown parameter we are optimizing , the data and the constraints (in this case there are no constraints) and the formulation. One way to formalize the problem is as function approximation. We assume y = f(x) for some unknown function f, and the goal of learning is to estimate the function f given a labelled training set. Most Supervised learning algorithms leverage this technique for formalizing the problem.

Identifying the right objective function is the crux of the problem!!

3. Evaluation
What is the success criteria, and how can we evaluate it?
The choice of metrics influences how the performance of machine learning algorithms is measured and compared. It is important to align on the metrics in the initial phases of problem definition.
There are a several evaluation metrics available & Different metrics are used for different kinds of problems
Confusion matrix, cross-validation, AUC-ROC curve, F1 score, precision, MSE, etc. are a few examples of evaluation metrics.
Deciding on what is more costly — false positive or false negative is critical to the success of the machine learning model.
A complete guide on Performance Metrics in Machine Learning on neptune.ai

4. Optimization
Solving for the objective function using traditional optimization approaches.
Clustering, for example — is an example of a class of optimization problems that can be solved using Expectation–Maximization (EM) algorithms that alternate between two steps in each iteration: expectation step and maximization step.
A lot of buzz words from calculus (which most of us would have mastered in high school) are come into action in this part of Machine Learning. Eg. Derivative is used to find the local minima.

Machine learning is really an art of formulating an intuition into an objective function.

Machine Learning has always been a flashy thing, and it continues so in the coming decade too. It can be very tempting to do some quick POCs and really cool demos. The real question is weather or not these POCs transform into business value.

Machine Learning does not exist in isolation, it is built on a foundation of compute , storage and analytics.

Philosophies of Modelling:

Alongside the ML Paradigms we discussed earlier, there are a few modelling philosophies which have been adopted by the machine learning fraternity. Below are the most common ones:

1- Occam’s Razor

Occam's razor is a 14th century principle that states that When you have two competing theories that make exactly the same predictions, the simpler one is the better.
Rephrasing that in ML terms — Simpler models are easy to execute & explain.

2- Bias Variance Trade Off

This is a central problem in the Supervised Learning paradigm.
Bias is the difference between the average prediction of our model and the correct value which we are trying to predict. A model with a high bias often pays little attention to the training data and oversimplifies the model. A high error on both training and test data could be attributed to High Bias.

Variance is the variability of model prediction for a given data point or a value which tells us spread of our data.

High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs

Model with a High Bias often UNDERFITS, where as a model with High Variance OVERFITS

Model with high variance pays a lot of attention to training data and does not generalize on the data which it hasn’t seen before. As a result, such models perform very well on training data but has high error rates on test data.

In layman terms — its the trade off between giving a Bad customer a high credit (vs) not giving the Good customer a Credit Card.

Bias Variance Tradeoff Explained. Source

3- No Free Lunch Theory

This slightly less adopted philosophy originates in the mid-nineteenth century, which in simple terms states that There is no universally best model . It all primarily boils down to the set of assumptions made — which work well in one domain and might not work well in another.

XGBoost might not always be the best possible model for all classification problems.

Another philosophy, under the same name — more specifically in the mathematical folklore states that “All Algorithms perform almost equally when the performance is averaged across all possible objective functions”

As I outlined in my previous article “Why Intuition Matters”, it is either not feasible or useful for a data scientist to understand the complete math behind all the ML algorithms. Intuitive understanding is all about understanding these algorithms sufficiently to be able to choose the right model, with the right complexity for a given business problem.

Stay tuned on this publications for upcoming articles, on understanding the most common algorithms at a level where the right choice of algorithm can be made for a given problem. The focus will be on dissecting machine learning algorithms by intuition!