# What is linear regression and how to use it with Python Scikit

Linear regression is one of teacher-based learning algorithms. This means, we run our model in 2 iterations:

- We train our model on a dataset with known answers (and test it to estimate model efficiency).
- Then we can use trained model to predict values for a dataset with unknown answers.

# How linear regression works

The idea of liner regression is pretty simple. Let’s image the following dataset (simple set of x and y values):

Now what if we tried to draw a line that will have minimum distance to each of our points. We’ll end up having something like that:

This red line is a best fit line for our set of points. This is basically a calculated function (e.g. `y=2x+5`

). The process of finding this function is called linear (because we get a line as a result) regression (because we simplify the set of points to a function).

Now, if we try to calculate our `y`

values again for our known points, we get values on a best fit line:

Yes, we will see some error rate (depends on the quality of our model). But instead of having a limited set of points we now have continuous line, that allows us to predict `y`

values for new `x`

coordinates. And that is exactly what we need — predict new values:

# Using linear regression in Python

## Prepare dataset

Let’s generate 100 points based on a known function with random deviations to simulate “randomness”:

## Create and train model

Smart people usually take 75% of the train dataset to actually train model and leave 25% to estimate it’s quality. Let’s create and train model on first 75 points of our dataset:

## Predict

In order to predict values we can now use `predict()`

method of our trained model:

In order to estimate our model quality let’s plot our initial data points together with predicted and actual values:

What we will see is:

Green points are our 75 train dataset points. Blue points — are 25 points valid values used for testing our model. And red points — are predicted values for 25 test points. As we can see predicted values are aligned towards a single line — best fit line which our trained model found.

# Summary

Linear regression is one of the basic teacher-based machine learning algorithms. It can easily be used with Python Scikit module:

Read the simple explanation of what is a neural network.