ML Scikit learn : Regressions

Regression investigates the relationship between the dependent variable (output) and independent variable (input). This model falls into continuous supervised learning category in Machine Learning. Hence using regression is the best choice to build a model where output is continuous instead of discreet values.

Insight on Linear regression:

Linear regression tries to predict the data by following the below steps:

  1. It will try to fit a best line between input and output data set, so that it covers almost all the points. (y = mx + c where m is slope and c is intercept).
  2. While fitting the data it tries to minimize the sum of square error (Actual value — predicted value). This can be done either by using Ordinary least square method (scikit learn) or gradient descent.
  3. After fitting the line it will have the values of m and c.
  4. Then with the help of slope ‘m’ it will detect the future output of new input.
  5. Intercept ‘c’ decides where the line passes through or intercepted. If c = 0 then line passes through origin.


Lets train and predict the net worth of a person using their age. The following is just an assumption not a real data set.

Roll up your sleeves for Scikit implementation:

Let’s get started !!!

# It is a good practice to split 70% of our data as training set and use the remaining 30% data to check our model accuracy.
parameters = [ input_train, output_train, input_test, output_test ]
from sklearn.linear_model import LinearRegression
model = LinearRegression()
# Training or fitting the data, output_train)
# Prediction
# Getting slope
# Getting Intercept
# Finding Accuracy
#Finding R square score

The ‘R squared score’ varies from 0 to 1. Hence if it is anywhere near to 1, we can conclude that our model is doing great.

I used a simple one variable linear regression to explain regression, but the same holds to Multi-variate linear regression as well, just that some extra features/variables will be available to predict output.