Machine Learning — Basic idea on how machine learns

I have been working on some complex problems in the field of Machine Learning for a few months so decided to write a series of posts on what I have learned yet. This is the first of it.

Machine Learning(ML), for me, is giving the ability(Mathematical) to the machine to learn and then predict on its own once it has been properly trained. So using ML techniques you can define a model which can predict some numbers like prices of property, classify objects like given image is of which animal and can even do some very complex learnings like what self-driving cars do.

How it does that?

Based on the nature of learning it can be broadly classified into three classes:
1) Supervised Learning: Most of the problem falls under this category. In this type of learning, we already know what machine has to learn like detecting whether an email is a spam or not. For these cases, we should have a labelled training dataset in which we should have values for input parameters and what should be the exact output for these input parameters. As the dataset contains output alongwith the input that is why it is called labelled dataset. Like for predicting the price of properties, we should have a training dataset in which a lot of examples should be there based on which our model learns certain pattern and then in future can predict the prices of the property.

For the above table, the input parameter is property area, number of bedrooms and number of bathrooms. To create a good model there should be a large dataset and there can be substantial number of parameters as well like how old the property is etc. After the model is trained properly when we will provide new parameters it will predict the price of the property. Model is nothing but some Mathematical function. We will come onto this later.

Supervised learning can further be divided into two categories:
Regression problem: It is a problem when the output is real values. For example the above-mentioned problem of predicting the price of a property.
Classification problem: It is a problem when the output is some fixed classes or in another words fixed number of outputs can come like in spam filtering where outputs can be whether an email is a spam or not.

2) Unsupervised Learning: In this type of learning we do not have any idea what machine has to learn exactly. There is no labelled dataset. The exact goal is to learn more about the given data. In this case, there is no fixed output. In this, the primary task is to find the structure or pattern in the data. For example, Google News in which similar types of news are grouped into similar clusters.
Unsupervised learning is also of two types:
Clustering: In which we want to discover the inherent grouping in the data. For example, grouping customers based on their shopping pattern. In this, we try to keep an object in a group which is as similar to other objects in that group as possible and should be as dissimilar to objects from other groups as possible. 
Association: In this, it tries to find similarity between different entities. For example, if X buys some product what are the chances that Y will also buy that product.
 
3) Reinforcement learning: it is more of behaviorist psychology. In this, the primary goal is to increase the cumulative reward in a certain and defined environment. It is more like continuous learning environment in which for a certain action taken by the model it is either rewarded or punished. In this also there is no fixed output, but there is no labelled data as well. In the end, it discards all those steps for which it was punished and therefore tries to take only those actions for which it will be rewarded most. One of the most popular examples is the self-driving cars.

To give an idea how machine learning works we will take the simplest model which is linear regression model.

Linear Regression: It is a supervised learning approach where based on training dataset we try to map input variables to some continuous function to find real values output.

Linear Regression

It is like an equation of a straight line y = mx + c where m and c are the parameters we have to learn.
If we have to represent the model it will be somewhat like this:

General Model of any ML algorithm

The hypothesis function or the score function we get is the ML algorithm which gives the output based on the input provided.

The above equation is of a general form where x1, x2, … xn are the input parameters while ϴ1, ϴ2 … ϴn are the trainable parameters which basically decides the output based on the given input values.

How the ML model validates that whatever it is predicting is correct or not?

Loss function: It is used to measure the deviation of the prediction of the model from actual output. For example we are using mean squared error function.

where ϴ are the trainable parameters, x are the input parameters and h is the score function, y is the actual output and m is the number of examples present in the training dataset.

Where is learning happening?

The whole idea is to minimize this loss function value for the complete dataset. So what we have to achieve is to choose or learn such theta parameters for which this loss is as minimum as possible.

How is it done?

If we think hard then to find minima of a function is nothing but differential calculus. But generally, this is not the way minima is found for loss function. But the almost similar approach is Gradient Descent. There are variants of Gradient Descent as well which are better and faster. But we will cover it, all about gradients, in another article.

Gradient Descent
It is an algorithm to find the values of all ϴs, basically the parameters of the model, such that value of the loss function is minimized.
Algorithm :

where
𝛂 is called the learning rate which decides the rate of convergence to find the local minimum.

How Gradient Descent works and how the theta parameters are updating will be discussed in another article.

Points to remember about gradient descent:
1) if alpha is too small then process of convergence may take very long time.
2) if alpha is too large then gradient descent can overshoot the minimum and may fail to converge.

That is all for the basics of ML. If you are just a beginner into Machine Learning world and you are not able to catch most of it do not worry I will come up with more articles and where we will dig deeper into how every individual part is working.