Correlation and Linear Regression

@IanChriste
2 min readNov 16, 2019

--

Analyzing their relationship

Correlation

Correlation — short for Pearson correlation- measures strength of linear relationship between two numeric attributes. (.7 usually representing strong relationship)

If we take the example of Body Mass Index (BMI) we can see how correlation and linear regression are related.

Essentially BMI takes number of attributes as input and maps these to a new value, creating a new attribute. This derived attribute — BMI — has a high correlation with the target attribute — DISEASE. Your height and weight may not be highly correlated to DISEASE but your BMI will be.

Linear Regression

Used when your data set is numeric, it estimates the expected value given fixed input attributes. Linear regression functions are easy to read and understand

Linear regression function: Y = a + bX, where Y is the target variable and X is the input variable.

Calculating Error in linear Regression:

Sum Square Error: The error of the function for each point is squared — the error can be overestimated or underestimated, squaring makes error positive in both scenario.

Linear Regression looks to minimize the SSE.

You can think of error/residual = target(known) — prediction (linear regression prediction)

Your best fit line in linear regression is the line that minimizes the sum of the squared residuals The best fit line goes through the average Y and average X.

Note: Weighting is based on distance from the line, so outliers will have disproportionate effect. It is extremely important to check for outliers and their impact on your regression line. 

--

--