Introduction to Machine Learning Algorithms-Linear Regression

Muktha Sai Ajay
The Startup
Published in
4 min readFeb 19, 2020

Artificial Intelligence (AI) makes it possible for machines to learn from experience, adjust to new inputs and perform human-like tasks. People across different domains are trying to apply AI to make their tasks a lot easier.

For example, doctors use AI applications to provide personalized medicine and X-ray readings. In retail, AI provides virtual shopping capabilities that offer personalized recommendations and discuss the purchase options with the consumer. Banking AI techniques can be used to identify which transaction is likely to be fraudulent.

What is Linear Regression?

Regression is a form of predictive modelling technique that investigates the relationship between a dependent (target) and the independent variable (predictor). It is a Supervised Learning technique.

The overall idea of regression is to examine two things:

  1. Which variables/features, in particular, are significant predictors of the outcome variable?
  2. Does a set of the predictor variable do a good job in predicting an outcome?

This method is mostly used for forecasting and finding out the cause and effect relationship between variables. In this technique, the dependent variable is continuous, the independent variable can be continuous or discrete and the nature of the regression line is linear.

Linear Regression

Simple Linear Regression establishes a relation between the dependent variable and one Independent variable. It is represented by the equation

The above equation can be used to predict the value of the target variable based on a given predictor variable(s).

Supervised Learning

Supervised learning as the name indicates the presence of a supervisor as a teacher. The training data will consist of inputs paired with the correct outputs. During training, the algorithm will search for patterns in the data that correlate with the desired outputs. After training, it will take in new data which will determine which label the new inputs will be classified as based on prior training data. The motive is to predict the correct label for new input data. It can be written as

Let’s Code!

Here we use the scikit-learn library to import the linear regression model and use it directly. There are many datasets available online for linear regression. You can find the dataset and code in the below link

Import Libraries

In the above lines of code, I just imported all the libraries I will be needing in the process

Pandas -> To load the data file as a Pandas data frame and analyze the data.

Matplotlib -> I’ve imported pyplot to plot graphs of the data

Import data

Our file is in the CSV(Comma Separated Values) format, so we import the file using pandas. Then we split the data into Dependent and Independent variables. X is considered as Independent and Y is considered as Dependent.

Train set and Test set

From Sklearn, sub-library model_selection, I’ve imported the train_test_split which is used to split train and test sets. We can use the train_test_split function to make the split. The test_size = 0.33 inside the function indicates the percentage of the data that should be held over for testing.

Now let’s fit the data

From Sklearn, sub-library Linear Regression we import Linear Regression and we fit the model on the training data. We use the R2 score to measure the accuracy of our model.

Predict Test Results

R2 Score On Test Data

Let’s plot our model!!

Training data
Testing data

Here is a summary of what I did: I have loaded in the data, split the data into train and test sets, fitted a regression model to the training data, made predictions based on this data and tested the predictions on the test data.

Every Machine Learning enthusiast must know about Regression and it is also the right place to start for people who want to learn Machine Learning as well. That’s all for now! I hope you enjoyed this post.

Acknowledgments:

--

--