Photo by Isaac Smith on Unsplash

Regression Analysis

Faridun Mamadbekov
The Startup
4 min readAug 15, 2020

--

Regression analysis is a reliable method in statistics to determine whether a certain variable is influenced by certain other(s). The great thing about regression is also that there could be multiple variables influencing the variable of interest. Regression analysis can be used for prediction.

You have to understand the two types of variables to get started with regression analysis:

Dependent variable — the variable that you want to examine, understand or predict.

Independent variable(s) — all the other variables that you hypothisize to influence the dependent variable.

In order to start the regression analysis, the dependent variable should be chosen. Then the independent variable or variables should be chosen which you hypothesize to affect the dependent variable.

The next step is obtaining data for the regression analysis. This is usually a dataset that has the identified dependent and independent variables. As an instance, if there are separate datasets available for each of the variables, the variables of interest can be extracted and combined into a new dataset.

A scatter plot where the points are are scattered but follow a positive slope

After that, the data should be plotted. The dependent variable always goes on the x-axis and the independent variable on the y-axis.

From the plot, initial trends and correlation can be observed that suggest what kind of relationship the dependent and independent variables have. In the example to the left, the hypothetical data points have an increasing trend. As the independent variable increases the dependent increases as well.

A trend could be observed from the plot, but what is the precise degree to which the dependent variable is influenced by the independent? A regression line should be calculated. Usually, this can be done in software like STATA or Excel. The regression line is the best approximation of the data points on the plot.

In other words, explains Redman, “The red line is the best explanation of the relationship between the independent variable and dependent variable.”

Calculating the regression line

Calculating a regression line means finding a best-fit line for all the data points. For simple linear regression analysis, usually, the least-squares method is used.

The linear regression line is a simple line of the form y=mx+b. In order to find the best-fit line for your data you need to first find the five summary statistics:

  1. Mean of the x values

2. Mean of the y values

3. The standard deviation of the x values (denoted sx)

4. The standard deviation of the y values (denoted sy)

5. The correlation between X and Y (denoted r)

The formula for calculating the slope m of the regression line is the following:

This formula calculates the slope for the regression line equation of the form y=mx+b. Now the last part to calculate is the y-intercept b. It can be calculated using the formula below:

are the means of the x values and y values respectively and m is the already calculated slope.

The regression line that Excel will produce for example will look something like y=6x+70+error_term. This is different from the simple regression line that we calculated in that it has an error_term.

Regression lines always consider an error term because in reality, independent variables are never precisely perfect predictors of dependent variables.

In reality, the dependent term might be determined by a number of different factors. The regression line is only an estimate based on the data available to you and the larger the error term is the less definitely certain your regression line is.

Conclusion

Regression analysis helps determine effect of some variables on another. It is widely used in business analysis for determining different factors that influence the target variable and predict its future values.

We’ve discussed what regression analysis is and how to calculate the regression line.

References:

  1. Gallo A., (2015). A Refresher on Regression Analysis. Retrieved from: https://hbr.org/2015/11/a-refresher-on-regression-analysis
  2. Deborah J. R., (2020). How to Calculate a Regression Line. Retrieved from: https://www.dummies.com/education/math/statistics/how-to-calculate-a-regression-line/

--

--

Faridun Mamadbekov
The Startup

I am a blockchain enthusiast and advocate for decentralization. I also like poetry, Sashi Zaifi, and football.