Introduction to Linear Regression — sklearn Diabetes Dataset

Deepak P Nair
Analytics Vidhya
Published in
2 min readFeb 21, 2021
Linear Regression — Nothing but finding the equation of a line lying at a minimum distance from the surrounding data points.

We all know the equation of line, that we learnt in high school,

y = mx + c

If you know this it literally means that you know the equation for a simple linear regression. Many a times, we feel big words like ‘Regression’ can mean big things, while they might be as simple as the above equation.

In Linear Regression,

y : Is a variable to be predicted ( aka Dependent variable) . It is of numerical continuous data-type.

m : here the coefficient ‘m’ is nothing but the slope of the line.

x : Is the variable which is called the independent variable.

c : We know this as a constant value , aka y-intercept.( The value of ‘y’ when ‘x’ is zero. Basically means it is the point at which crosses the vertical axis.

With Multiple Linear Regression, the numbers of x’s (predictors / features) will be more than one. The the equation will look like.

Y = m1x1 + m2x2 + …… + C

Linear Regression is simple, easy to understand, yet a very powerful machine learning algorithm. Its basic assumption is that the independent variables / features are “Linearly” related to the response / target variable.

For now, we will focus on how to do a Linear Regression in Python & Analyze the results. The dataset we will be using is an inbuilt dataset called ‘Diabetes’ in sklearn package.

Linear Regression Analysis-Using the Diabetes Dataset

Thank You for Reading. If you want to read more on ML Topics please follow me & motivate me by clapping & sharing the content. Thank You & Happy Learning!

--

--