A non-techie intro to Machine Learning

Machine learning is a one of the hottest buzzwords around at the minute. The Health and Public sector, Financial Services, Manufacturing, pretty much everyone is trying to deploy Machine Learning, or at least exploring how they could be using it. The tech giants have also been busy in the ML space as these recent ML acquisitions demonstrate!

ML is popular because it unlocks the potential of all that Big Data which has been built up in silos and is now stored in lakes, hubs, clusters and even the odd swamp. ML promises the ability to automatically detect unknown patterns, uncover deep insights and leverage high performing predictive models.

Some common uses for ML include Financial Trading, Medical Diagnosis (in particular cancer), Fraud Detection, Recommendation Systems (Netflix, Amazon), Security, Risk and Compliance and Web Personalisation. However, an ever increasing number of sectors are exploring how they can realise the benefits that can be realised through ML, now that they have addressed their data management challenges.

Even with all the chatter around ML, asking 10 people to explain what it is, may yield 15 definitions, particularly from a non-technical crowd. So in this series we will explore the key elements of machine learning. In this the first post in the series, I’ll provide a non-techie intro to machine learning as an area, some of the key problems it is used to solve and call out some of the main algorithms employed. Then in later posts I will delve deeper into the algorithms, before exploring recent developments in Deep Learning. In each post in the series I’ll provide a case study to demonstrate how machine learning can be applied to deliver business benefits, drive real value from data and enable data driven decision making.

But before we get into that, let’s take a step back.

It’s important to get a little bit of context around where ML sits in the very crowded and confused Data landscape that exists today. Frequently you will hear terms like, Deep Learning, Machine Learning, Artificial Intelligence, Data Mining and more, all used pretty much interchangeably.

While confusing, it’s understandable, as each of these areas has significant overlap with each other. Brendan Tierney’s diagram below shows the interdisciplinary nature of Data Science and nicely demonstrates that each of these areas make up a piece of the puzzle that is today’s data landscape.

Data Science Landscape — Brendan Tierney http://www.oralytics.com/2012/06/data-science-is-multidisciplinary.html

Not included in the diagram are terms such as Advanced Analytics and Predictive Modelling, which can be used to describe some or all of machine learnings capabilities, but that’s a discussion for another day.

Machine Learning defined

So what is this Machine Learning lark? To me, ML is a specialist field which uses algorithms to learn from data and create generalisable models which can produce predictions or identify patterns in previously unseen data. That may not help too much at this point, but hopefully as we work through the rest of this post it will become clearer.

The basic ML process

Imagine a dataset as a table, where the rows are individual observations (data about something in your organisation e.g. a customer, product, sale etc.), and the columns for each observation represent the features (e.g. sales price, income, age) of that observation and their values.

Typically, at the start of a machine learning project, the dataset is split into two or three subsets. The minimum subsets are training and test datasets, and often an third validation dataset is created as well. Think of maths problems at school, you have a set of worked examples that show you what the answer should be (training set), end of chapter questions, with solutions so that you can check your answer (test set) and additional practice questions (validation set).

Once these data subsets are created from the primary dataset, a predictive model or classifier (made up of one or more algorithms) is trained using the training data, and then the model’s predictive accuracy is determined using the test data.

Types of learning

Now that we have a bit of a feel for what ML is at a high level let’s dig into a little more detail.

Machine Learning breaks down into two main categories, supervised and unsupervised. For completeness, I should point out that there is a third category, semi-supervised, but this series will focus on the first two which are most common.

In supervised learning, you start with a dataset which includes examples of what you are trying to model. The model is trained by showing it examples of labelled data (the label is the answer you should get) and then tested by passing in new unseen data. The performance of the model is assessed by comparing the predicted label against the actual label. In reality assessing performance is a little more involved, but more on that in later posts.

You can think about supervised learning a bit like trying to solve a problem in a textbook which has the answers at the back. You work through some examples and then try the test questions. You can check if you got the correct answer by checking the back of the book. If you are incorrect then you can try to solve the problem again. This gives you a feel of the iterative nature of the learning that happens within the algorithms.

The difference with Unsupervised learning is that there are no labelled examples and therefore the model must try and identify patterns within the data. It does this by examining the values of the features which make up each observation and grouping those which are similar. This enables the data to be split into groups of similar things. The unsupervised approach is particularly useful when trying to identify anomalies, things which don’t fit into any of the groups.

A Simple Example

The table below is an example of a (very) simplified dataset that might be held by a bank on their customers who have applied for a loan. Each row describes a particular customer (an observation) and the columns contain data that describes the customer (features). There has been some feature engineering here as age is now a band (Young, Middle, Old) rather than an age in years and months, while Employed and HomeOwner are boolean flags (True, False). The Label in this example is the feature which describes whether the loan was approved or not. A supervised learning approach would learn what makes observations fit into each of the two classes (Yes or No) and then when presented with new (unseen) data would use the learned approach to place each unseen observation into one of these two classes.

A simple example of a modelling base table with observations, features and label

In an unsupervised example your intent would be to find patterns in the historic customer data and learn something that you don’t already know, or group customers in a particular way based on a point in time. To achieve this, you could use a clustering approach which groups (clusters) the data automatically, allowing you to analyse it. For example, when looking at a customer credit card transactions it will be possible to identify anomalous transactions which may or may not be fraudulent.

Now you might be saying “can’t all of this be achieved without machine learning?” and the answer is yes, it can.

Financial institutions have always assessed loan applications, they have always investigated particular trends within their customer segments, but how much effort and time has it taken to perform these tasks manually or with limited technology support? The advantage of machine learning is the speed with which these insights can be surfaced and acted upon and also the ability to identify the unconsidered scenarios which can be detected from examining cases which fall outside of the clusters (groups), the outliers. More on those later!

Machine Learning Problem Types

Okay, so far, we know that there are two main types of ML, supervised and unsupervised. Next, we need to explore some of the problem types or use cases which are served by each type of learning.

Clustering (Unsupervised) is a technique for discovering the composition and structure of a given dataset. It is a process of gathering data into clusters to see what, if any, groupings exist. Each cluster is characterised by a contained set of data points, and a cluster centroid. The cluster centroid is basically the mean (average) of all of the data points that the cluster contains.

Classification (Supervised) splits observations (e.g. customers) into a pre-defined classes or categories. This can be a single assignment (e.g. Approved or Not Approved) or might also be a probability for the observation belonging to each of the available classes (e.g. likelihood to purchase).

Regression (Supervised) gives a continuous value as an output rather than a class. Think house price prediction or the closing price of a stock market. Just to confuse you a little more, regression is used in the name of an algorithm that is actually used for classification problems. I’ll clear that up in later posts on algorithms.

Anomaly Detection (Supervised & Unsupervised) discovers data that is “misbehaving”, or rather is not what you would expect to see. This might be due errors in measurement or collection, or due to fraud by customers. It may also be used to identify devices or components which are not operating as expected, such as condition monitoring and predictive maintenance.

Machine Learning Algorithms

So far, we’ve covered a high level intro to Machine Learning, the two main types of learning and some of the problems ML addresses. The last part of this post calls out some of the algorithms used in ML. They’re broken down by problem area and you will notice that some algorithms are used in more than one problem area.

A sample of the algorithms available for each learning and problem type

On occasion a single model (built using an algorithm above) will not give you the performance that you hoped for. We’ll discuss the reasons for this in when we look at the algorithms in detail, but a common solution is to combine multiple models. This approach is called an ensemble modelling and we will look deeper at ensembles, such as Random Forests in later in the series.

That’s it! This has been a whistle stop intro to machine learning, hopefully you now have a feel for what ML is, the types of learning it involves and some of the common approaches. The next post in the series will delve deeper into the supervised learning algorithms and examples of how they can be used to solve complex problems across a variety of industries.