A Guide To Supervised Learning

A General Recipe for all the Supervised Learning Techniques.

Aditya Oke
Machine Learning Magazine
10 min readSep 4, 2019

--

The Machine Learning theory is split into three domains namely Supervised Learning, Unsupervised Learning and Reinforcement Learning. (If you are unsure of them please check the previous blog link).

What is Supervised Learning?

Let me tell you what is supervised learning in one line.

Teach me and I will learn.

Supervised learning requires someone, to guide us while we are learning.
This is the most natural way in which humans learn as well. We are taught how to write and read, we are taught how to do arithmetic. If this form of learning is very natural and hence making computers learn in this way a great idea.

Supervised learning is the most common form of learning that we encounter in Machine Learning. In fact, Andrew Ng once said that more than 80% of problems involve supervised learning. Supervised learning spans multiple domains such as Regression Algorithms, Neural Networks, Decision Trees, Support Vector Machines to name a few (In case you don’t know them it’s okay we will look into them soon.) Supervised Learning is a very powerful technique as it tends to be far more accurate than the other two.

But often people complain….

There are too many algorithms in Supervised Learning. Is there any general recipe?

True that there are many algorithms in this domain. It is difficult to understand them and correlate as all try to solve similar problems. All the supervised learning algorithms have 6 things in common. Let us make 6 jars and then see them one by one.

Image result for 6 Jars of Machine Learning
  • Data: (Always necessary)
  • Task: (We don’t solve a problem without knowing what is our task 😃)
  • Model: (Yes! You heard it right, that same word from my previous blogs)
  • Loss Function: (Don’t worry I will explain this 😅)
  • Learning Algorithm: (A Generic algorithms for all 😇)
  • Evaluation Technique: (Nothing is learnt till it’s not evaluated 😃)

The Data Jar

We consider real-world data here. Recall that in supervised learning we require labelled data i.e. data which has the attributes or X values and Y values which are the labels. Let’s take an example to make this clear.
Suppose you have a tabular data, maybe the data is regarding your medical profile and says whether you are healthy or not.

You may have various columns in your medical profile, such as Age, Blood Group, Gender, BP, Sugar value, etc. This belongs to the X values or the attributes of the data.

What do we want to label this as?
We want to label whether the person is healthy or not. So, for each row, we have a label healthy or unhealthy, this belongs to Y value.

Now we are given this data, we need to do some analysis with data.
So we need to define a task for the data.

The Task Jar

The task jar decides what needs to be done with the data. In supervised learning, we can have only two tasks. They are regression and classification.

Let us dive into them briefly.

Regression Tasks: -

In Regression, we try to predict or forecast a future value based on current data. Examples of this are stock market predictions and weather forecasting. Think about it, how do we predict tomorrow’s weather? We think about various factors such as the season, today’s weather, some reports from news, personal experience and hence we say what tomorrow’s day will be. Regression tries to perform the same analysis on the data; it tries to learn from the previous values and forecast the future.

Classification tasks: -

The first problem that we discussed in the data section about saying whether a person is healthy or not is a classification task. Given the attributes of data we try to guess where the person belongs; is he healthy or unhealthy.

This way of classifying a data point into one of two possibilities is called binary classification.

Let’s take another example. Suppose I give you data of handwritten digits

0–9. After this, I pick any one data point and ask you which digit is this? You will have 10 choices here to choose from, right? The digit belongs to only one of them. Here we have labels belonging to different classes. This type of problem is called Multi-Class Classification.
Regression and classification
form the tasks in supervised learning.

But who will do this task?
We need someone who can do these tasks on the given data. For that, we have the next jar.

The Model Jar

A Machine Learning model tries to map the given X attributes to the output Y. So, the model is a very simple jar where we try various algorithms each trying to map X to Y. Each model tries to learn the mapping between X and Y. It has to be trained. A model cannot give correct outputs until it is trained.

But there is a confusion here…………

How do I know which model is best or which algorithm is best?

Clearly, there is no best algorithm (otherwise we won’t have so many algorithms at all) The best model depends on data given and task required.

This part is highly empirical and this is where the Art of Machine Learning lies. No clear winner, just keep trying…. I will give you guidelines as we go along.

How will the models learn?

The models need to learn the data. Initially, the models make wrong predictions, but as we keep training them, they improve. To make them learn we need the next two jars.

Loss Function Jar

Let us understand how we learn from our mistakes.
Consider this conversation with an interviewer and a candidate.
(Make sure you have a sense of humor)

Interviewer:- What is the value of 2 + 2?
Candidate:- It’s 2.
Interviewer:- No you are not quite right it is a bit less, try again.
Candidate:- It’s 5.
Interviewer:- No you missed it by a slight margin, try a bit less.
Candidate:- It’s 4.
Interviewer:- You’re selected as Machine Learning Engineer.

Hope you had a laugh. What exactly happened here? What if the candidate was the model which we are trying to train? How did the candidate learn?
The candidate tried to minimize the error. First, he answered 2, error was 2. Then he answered 5, now error was 1. Finally, the error was 0 and he was right!!

We similarly train our Machine Learning models, we first get a random output, then see what the actual output should be, recognize the mistake and learn from it. But to learn, we need to understand the error first.

This error is captured by the Loss Function. The loss function is simply a measure of how inaccurate your prediction was. It is a function Y_predicted and Y_given.

These functions are already made by statisticians and studied for long. Our objective is to minimize this error value or technically speaking the loss value. But to minimize that we require a certain mechanism or an algorithm that will help us to minimize the loss function. This is given in the next jar.

Learning Algorithm Jar

Every Machine Learning Model needs to learn. We need to train it. For it to learn it must have some algorithm. This algorithm does the job of minimizing the error made by the model. Every model has its learning algorithm but all have the same underlying principle.
Let me write a generic recipe to make it clear.

Initialize the model with its default parameters.while (not satisfied):
Update the parameters of the model such that error is reduced.
Save this new model.

That’s it, it’s that simple. All the machine learning algorithms do the above lines. Wait there are two terms that I never told you.

What is satisfaction ?

Satisfaction can vary from model to model, person to person. The satisfaction maybe being able to reduce the loss (error) to the minimum value. Sometimes, it may take very long to reduce the error to the minimum value. Then satisfaction may mean running the code for 10 minutes or certain time. So, satisfaction may vary, it depends on what we intend to do.

What are parameters ?

Suppose my machine learning algorithm is y_pred = a X + b.
Here, X is the input and y_pred is the predicted output. What I am trying to learn is a straight line. But how do I adjust this straight line? I need a slope and an intercept, here ‘a’ will learn the slope and ‘b’ will learn the intercept.
So, we say that ‘a’ and ‘b’ both are parameters.

But how do you update them ?

Suppose we somehow know that y_actual = 3X + 5. But my model started by taking values a = 2 and b = 9. That is, we have the y_pred = 2X + 9. Now, we see that we have to increase ‘a’ and decrease ‘b’. This information will be captured by the loss function, the loss function will guide our model to update these parameters. So, minimizing the loss function (error function) will tune the parameters as well. To minimize the loss function, we have different methods for each model. We will look into it as we learn models.

Evaluation Jar

Imagine you are going to write a test on trigonometry. How do you prepare for it? You prepare like a machine learning model, initially untrained. Then you take the textbook and start practicing questions, the more you solve more you are trained, the better you become in the subject.

But does that mean you do well in the test?

Not necessarily, we need to check whether you learnt to apply the concepts properly. We need to ensure that you are not mugging up the textbook.
We must evaluate your performance in testing phase not during the training time. Once you are able to solve unseen questions in the test, we can say that you learnt well.

How does this relate to Machine Learning models?

Our Machine Learning model is trained on the given data for whom we know what the output is. But we need to ensure that the “model” actually learns the data, not mugs up the relation. For that, we need to test it with some unknown data whose answers we do not give to it. We ask our model to now solve for this unknown data. Then we verify its answers for with the answers that we have. If it does well on unknown data, then we say that the model learnt well.

Conclusion: -

That’s how every supervised learning algorithm works. Once we know about all the 6 jars for each model, we are done. The six jars analogy will help us in future blogs will make it easy to understand even if you are uncomfortable with the math.

It is always simple to learn if the way of learning is simple.

Let us summarize all the six jars.

The six jars model provides us a simple way to understand all the machine learning algorithms.

Image result for 6 Jars of Machine Learning

Data Jar

  • Data comes from real-world as a collection of attributes.
    These are our X values.
  • In supervised learning, we have labels for the data.
    These are our Y values.

Task Jar

  • We have two tasks in supervised learning, regression, and classification.
  • Regression task involves predicting future values based on known data.
  • Classification task involves classifying the data into subgroups.
  • If we classify into two subgroups then it is called Binary Classification.
  • If we classify into more than two subgroups then it is called Multi-Class Classification.

Model Jar

  • Contains the machine learning model which converts the given X attributes to a predicted Y value.
  • Model is just an approximation; it does not guarantee that our predicted outputs will be the same as expected values.
  • We need to train the model before using it.

Loss Function Jar

  • Calculates the error between output produced by our model and the actual output.
  • We have to minimize this error. To minimize this, we need the learning algorithm jar.

Learning Algorithm Jar

  • We need to design an algorithm that minimizes the loss function until we are satisfied.
  • Update the parameters of the model as we minimize the loss function.

Evaluation Jar

  • Ensures that our model has learnt from data, not mugged up the inputs and the outputs.

That’s it for this blog. Hope that I made it simple for all to understand how supervised learning is structured. Stay tuned for the next one 😄 where we will discuss our first supervised model.

Who am I?

I am a University student trying to bring AI closer to the common man. Through my blogs, I want to educate everyone about Machine Learning.
I intend to reach out everyone, even to those who have no clue what AI is.
You can view all publications by clicking here.

You can view me on LinkedIn by clicking here.

Thanks to Anushchandra Shetty for the Proof-reading and suitable edits.

Citations

Thanks to Prof. Mitesh Khapra from Indian Institute of Technology, Madras for giving the six jars ideology and the six jars image.
Without him, this blog would be impossible.

Thank you to Towards Data Science for the regression, classification images.

--

--