Supervised machine learning for consultants: Part 3

Published in

Cervello, a Kearney Company

4 min readNov 10, 2020

Algorithms: the engine behind machine learning

Here’s the situation: you’ve spent valuable time creating a curated data set with ample observations and features that exhibit noteworthy relationships with the output you would like to predict. Now, you’re trying to decide what kind of model to build. Neural networks? Support vector machines? What about random forests or plain old linear regression?

Before you decide, let’s clarify what’s going on under the hood. When it comes down to it, the method dictates what we are learning from our data: it sets the structure of our model. This obviously changes when we choose different methods.

Then there’s the algorithm, which controls how the computer learns the method’s specific structure. This process is often consistent and very intuitive.

Most machine learning algorithms are iterative: they repeatedly modify the model according to feedback they receive from the data. This feedback is captured in what is known as a loss function. Loss functions mathematically represent some notion of our model’s predictive accuracy. In addition, loss functions are aptly named; bad models that offer poor predictions incur large values of the loss function. Therefore, algorithms seek to minimize the loss function. When the model should no longer be modified, we terminate the algorithm.

But how do algorithms literally work?

First, you the data scientist need to set a budget of iterations or turns that the algorithm is allowed to take to find the best model. Once this limit has been set, your algorithm will run. Here’s how:

(In a robot’s voice)

Start the estimate of the model somewhere. (This can be random or an educated guess of what the model should be.) Call this the current model. Set the turn number to zero.

For each turn number, as long as the turn number is less than the number of turns budgeted:

1. Evaluate the loss function using the current model.

2. Adjust the current model so that the loss function decreases.

3. Change the current model to the adjusted model.

4. Add one to your turn number.

When the iteration budget is exhausted, return (obtain) the current model.

Boom! You have just learned how many supervised machine learning algorithms work. As your method becomes more complicated, so too does the loss function, making it difficult to minimize. In this case, we allot a higher budget of iterations (think thousands of iterations — machine learning can take time).

Next, run your algorithm, which gives you an estimate of the model you wanted to build. Now, we need to evaluate it.

Is my model good?

First off, we cannot feed all of our data into the algorithm. We need a train–test split. This amounts to the creation of two random groups among your data. The first, bigger group (usually around 80 percent of the original data) is known as the training data, and this is the data you use to build your model.

The second group, the testing data, is used to verify that your model is accurate in its predictions. This is simply done by passing the testing data’s inputs into the learned model to obtain predictions for each observation and comparing these predictions to the testing data’s observed outputs, which are the ground truth.

Let’s return to the advertising data set from my previous post (see Table 1).

We’re using information on advertising spend to predict sales. Suppose we have chosen our model structure (neural network, random forest, etc.) and learned a model according to that structure with an iterative, loss-based algorithm using a random 80 percent of the data. Now, we feed the drivers of the other 20 percent into our model and see how well the model’s predictions of sales compare to the true sales.

You get some number from this. Let’s say your model is 80 percent accurate on that test set. Is this good? Well, this question is the center of one of the most difficult tasks in supervised machine learning: model selection.

See, the model we just built is an estimate of an underlying true dynamic between advertising spend and sales. We will never know this true dynamic, but hopefully, our model is a good approximation. However, we may be able to find a better model using some other machine learning method with different engineered features. But again, what does better mean? We need tools to compare models so we can make judgments about which model we should actually use. In my next post, I’ll introduce these tools as popular methods for selecting a model.

About Cervello, a Kearney company

Cervello, is a data and analytics consulting firm and part of Kearney, a leading global management consulting firm. We help our leading clients win by offering unique expertise in data and analytics, and in the challenges associated with connecting data. We focus on performance management, customer and supplier relationships, and data monetization and products, serving functions from sales to finance. Find out more at Cervello.com.

Supervised machine learning for consultants: Part 3

Written by Joe Feldman