Part 1 :- ABC of Machine Learning
Computer is an Idiot and computer programming is common sense.
In this blog-series, I will write about ABC of Machine Learning. I believe that learning ML (or any technology) is similar to learning a new language.
There are mainly 3 stages of learning a language:-
Stage 1:- Learning its ABC.
Stage 2:- Learning to construct words from ABC.
Stage 3:- Learning to construct sentences/paragraphs from those words.
In this blog, we will see the terminology used in Machine Learning.
Terminology
We will use Iris dataset for learning ML terminology.

Iris is a flower dataset of 3 different Irises built by Edgar Anderson(a botanist). He calculated the sepal length, sepal width, petal length and petal width of each flower and based on these properties he divided them into 3 unique classes of Irises. The dataset consist of 150 samples.
Data-point is a single sample in our dataset. Every data-point can be divided into a set of Independent Variables and Dependent Variable. In our Iris dataset, we can divide every data point (row) into a set of Sepal Length, Petal Length, Sepal Width,Petal Width (Independent Variables) and Class/Label (Dependent Variables).
Dataset is a collection of data-points/samples. Usually Dataset is generated by machines(e.g. Internet based companies) or by specialist scientist (e.g. Botanist). Dataset can be in a image, speech, text format, which needs to be converted to tabular format using different techniques for further analysis.
Class/Label is the unique name of each data-point. We can divide our dataset into set of these unique classes. Our Iris Dataset have 3 classes i.e. Versicolor, Virginica and Setosa.
Query-Point is the sample for which a machine needs to predict an unique class. This sample only consists of independent variables and using these values we have to predict the dependent variable. In our iris dataset, a query point will contain information about Petal Length, Petal Width, Sepal Length, Sepal Width and using these properties the machine needs to predict it’s Iris class.
Features/Dimensions, Data-point can be termed as a point in multi-Dimension(Features) plane. If a data-point consist of only 2 features than we can use 2d plots for visualizing it. In our Iris dataset, each data point consist of 4 dimensions i.e. Sepal Length, Sepal Width, Petal Length and Petal Width.
Outliers are some exceptional data points which can modify prediction of our model. These points can occur due to noise in dataset, manual human error or of exceptional cases. ML engineer usually remove these points in data-cleaning as they can affect the performance of our model prediction. Our Iris dataset doesn’t contain any outliers, thanks to Edgar for providing us a clean dataset.
Machine Learning
Machine Learning is a process of teaching machines according to the past experiences (dataset), so that it can predict the output of an unseen future event(query point).
In further Blogs, we will have a look into some basic Machine Learning Algorithm.
Stay Tuned.
Happy Reading.
