An Introduction to Machine Learning

Assume you have been given a simple task of recognizing different shapes.

You can easily be able to tell that one these shapes is a square, one of them, a circle and one, a triangle. Did you ever think about how you knew which shape was what? Since childhood, you have been taught to call a particular shape a particular name, and just like this, your computer can be taught in a similar way, by training it to do a particular task. This training of the computer is called Machine Learning.

Starting with the A B C’s of Machine Learning

In order to teach a computer how to solve a task, different algorithms are used.

There are 2 methods of making a machine learn:

Supervised Learning

In supervised learning, we give the input and result in pairs. Using different algorithms, the machine can then correlate the input and output. We use a labeled dataset for training the algorithm.

Unsupervised Learning

In unsupervised learning, there is no feature that is labeled explicitly. It creates clusters, and on the basis of that, the different properties are extracted.

In this article, the basics of supervised learning that is Regression, will be covered.

Regression is the most basic algorithm that is used in Machine Learning, when you want to fit a best fit function in the data. The equation of regression is :

If there is only one independent variable, we use Linear Regression, otherwise, if we have multiple variables, we use Logistic regression.

We will be using scikit learn to access the different Machine Learning models. The entire code will be written in Python. Use jupyter notebook or Google Colab for coding.

Let’s take an example of Linear Regression

Predicting the brain size if the weight of the brain is given.

Dataset :

https://www.kaggle.com/jemishdonda/headbrain/version/1#headbrain.csv

Code:

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

In computer programming, pandas is a software library written for the Python programming language for data manipulation and analysis.

pd. read_csv( )function reads the csv file and assigns the data of the file to the variable ‘data’

data.head() prints the first 5 lines in the dataset.

data.info() this function gives us the information about the dataset like the column headings, number of non-null objects in each column and the data type of objects in each column.

This is for assigning the data in the columns to the variables. We cannot give a 2-D array having one row and n columns. We need the input as n rows and one column. So, we are using the reshape function to find the transpose of that matrix.

Here, we are importing the function of linear regression and train_test_split from the Scikit learn library.

We are splitting the data into training dataset and testing dataset. The testing dataset size is 33% of the entire data. We get 4 arrays after using this function.

Importing the Linear regression model as a model. The fit function assigns a best fit line through all the data points.

As we had split the data points above, we use the X_test array to make prediction based on the model that we just built using the fit function.

This function tells us how accurately our model can predict the size of the brain. As you can see, the model is capable of predictions with 100% accuracy.

Hope this article has helped you get a good understanding of what Machine Learning is and how to start with it.

The link to the Google Colab notebook:

https://colab.research.google.com/drive/1d1l9ZM26XAOHSfPowdJnkSIAfQ7DsSMt

Happy Learning!

Author: Aakar Mutha

--

--