Getting started with Machine Learning Part-2(#GO-ML)
Link of the tutorials:- Chapter1
So let's get started with our second course Getting started with Machine Learning with Python. In this course, we will be discussing the following:-
- Types of learning in ML
- Classification(Knn algorithm)
Machine learning is sub-divided into the following types of learning:-
- Supervised learning
- Unsupervised Learning
- Semi-supervised Learning
- Reinforcement Learning
Now let's have a quick introduction of the above types of learning:-
Supervised learning as the name indicates the presence of a supervisor as a teacher. Basically supervised learning is learning in which we teach or train the machine using data which is well labelled that means some data is already tagged with the correct answer. After that, the machine is provided with a new set of examples(data) so that supervised learning algorithm analyses the training data(set of training examples) and produces a correct outcome from labelled data.
Unsupervised Learning is a class of Machine Learning techniques to find the patterns in data. The data given to the unsupervised algorithm are not labelled, which means only the input variables(X) are given with no corresponding output variables.eg.:-clustering, LDA
As you may have guessed, semi-supervised learning algorithms are trained on a combination of labelled and unlabeled data. This is useful for a few reasons. First, the process of labelling massive amounts of data for supervised learning is often prohibitively time-consuming and expensive.
What’s more, too much labelling can impose human biases on the model. That means including lots of unlabeled data during the training process actually tends to improve the accuracy of the final model while reducing the time and cost spent building it.
A reinforcement learning algorithm, or agent, learns by interacting with its environment. The agent receives rewards by performing correctly and penalties for performing incorrectly. The agent learns without intervention from a human by maximizing its reward and minimizing its penalty.
Now we shall look upon some classification algorithms of Supervised learning.
We will be focusing on the KNN algorithm that is widely used in classification problems.
Let's make this algorithm simpler by breaking it down into pieces.
KNN algorithm is very simple its just the Euclidean Distance formula.
Simple Right !!
So let's see how can this algorithm be used for classification. We will be considering the iris dataset for this. Let's have a look at how the iris dataset looks like:
To classify the species of the iris flower(namely:- setosa, versicolor,virginica).
First, let's get the Euclidean distance into code.
Initially, we set the default of distance equal to 0. Then we are running the loop from 0 to the length-1 of the total size of the dataset. In our case, we have test dataset of length 4. So the length becomes equal to 4. And the loops execute 4 times.
This is how the KNN algorithm works
As we see in the above image there are two classes namely A and B. Suppose we introduce test data. And our ultimate goal is to predict its final label i.e A or B. So we calculate the Euclidean distance of the test data with the other dataset. Suppose this is how our test and train dataset looks like
Now we calculate Euclidean distance between the test and train dataset. This is how the process goes.
And as we see that the Euclidean distance for virginica is less. So our nearest neighbour is virginica. This is how KNN algorithm works.
The above was only for two sets of data think of a bigger dataset like iris dataset that has about 732 rows. So we calculate the distance from each row and then we sort the value in ascending order. Code for sorting in Python
The code, as well as the working example, is available in the below notebook. As well as you can find the whole tutorial in my Github.
Show your love my clapping if you like it. Follow my Machine learning corses in Github.