An Intuitive Introduction to Linear Regression

Digya Acharya
6 min readFeb 5, 2019

--

Suppose you have recently started a clothing company X. You are new to the business and do not know the right approach to increase your sales. Meanwhile, one of your friends accords you the data from clothing store Y displaying the amount invested in an advertisement for fifteen years and the corresponding sales.

Table 1: Transaction in advertisement and sales of company Y

Let’s assume that you have allocated a total deposit of 86 million rupees for investing in an advertisement. How many sales can be obtained with this investment?

Try to answer this question from the table above.

You might guess 207, 208, 209 or some other random values. Since the intervals are not constant and the values are not changing in a consistent way, it is tedious to predict the correct answer from the table above.

This problem can be more appalling with the tremendously large and inconsistent dataset in which the relationship between the attributes cannot be apprehended properly. So, we need to find some other way for precisely predicting the approximate sales with the given investment in an advertisement.

Let’s visualize the relationship between advertisement and sales with the scatterplot.

Figure 1: Visualization of sales and advertisement

As the sales are increasing linearly with the growth of expense in an advertisement, there seems to be a direct relationship between the investment in advertisement and the attained sales. Let’s plot the best possible line that fit these points.

Figure 2: Best fit line passing linearly through all the points

With the help of this line, we can now robustly predict that the investment of 86 million rupees can give us sales of approximately 248 million rupees. The line has boosted our predicting capability and made our answer less prone to the error. This is exactly what linear regression does. Given an independent and a dependent variable, we fit an equation of a line to perform predictions on unseen data.

We use linear regression immensely in everyday life as well: the time required to reach a certain destination walking at a certain speed, the tentative speed needed to throw a stone farther into the river and so on.

Since our linear model consists of only one independent variable, this is an example of simple linear regression. Our equation can be modeled as:

This resembles the slope-intercept equation of the line with a slope of β1 and an intercept of β₀. Additionally, it also constitutes a term 𝜺 to model random error.

As we can see, the equation comprises of both deterministic and the random component. Let’s go through the deterministic part first and then, we will walk through the random error component as well.

Deterministic Component:

Regression slope (β1)

A regression slope is the number of unit increase in outcome associated with one unit increase in the predictor. As seen from the graph below, our slope or the β1 coefficient is 2.8, which means that one unit increase in the advertisement (ten million rupees) is associated with an increment of 2.8 million rupees in sales.

For instance, advertisement of 50 million rupees increases sales by 2.8 million rupees than the advertisement of 40 million rupees.

Figure 3: Increase of sales by approximately 2.8 units on one unit increment in an advertisement

Regression intercept or constant (β₀)

Regression intercept is the outcome when all the predictors or dependent variables are assumed to be zero. It can also be measured as the distance from the origin when the line crosses the y-axis. In the figure below, the regression line crosses the y-axis at approximately 170 giving us the intercept of 35.

Figure 4: Extrapolation of the regression line to obtain the value of an intercept

Random Error Component in Regression (𝜺):

It is incredibly difficult to generalize our regression model to predict the population parameters as our data can never incorporate all the values of the population. So, we take the samples from the population and calculate the point estimates. Thus, regression outcome is always associated with some amount of error.

Figure 5: Samples used to estimate statistical inference of population

The probability distribution of the random error is given by:

𝜺 ~ N(0, σ), where σ is the standard deviation

Figure 6: Errors assumed to be normally distributed. Source[image]: http://blog.nguyenvq.com/blog/2009/05/12/linear-regression-plot-with-normal-curves-for-error-sideways/

The errors are assumed to be normally distributed with a common variance. The normality assumption is not a compulsion, however, it eases the analysis of other statistical procedures and tests. The error becomes less and less noticeable as the sample size approaches the size of the population.

Nevertheless, as the sample size increases, the assumption of normality of the residuals is not required. If we keep on increasing the samples from our population, the distribution across repeated samples of the ordinary least squares follow a normal distribution as a consequence of central limit theorem.

Determining the best fit line:

We have now discussed the use cases of linear regression, the deterministic component and the random component of the equation. Yet the determination of the best line that passes through all the points is another important concept to ponder upon.

Figure 7: The possible lines that can pass linearly through the points

As many lines can pass through the points obtained from the dataset, the probability of obtaining the best fit line with some random guess is extremely low. The number of possible lines increases as the data size escalates.

One of the popular ways to determine the best fit line is by using Ordinary Least Squares (OLS) regression, which estimates the relationship between one or more independent variables and a dependent variable by minimizing the sum of the squares in the difference between the observed and the predicted values.

Figure 8: Residual = abs(observation — fitted value)

Another way to obtain the best line is by gradient descent in which we find the coefficients that minimize the residuals.

Let’s conclude

In this article, we have explored a simple linear equation. The same concepts can also be applied when you have multiple independent variables. Sales of the clothing company do not only depend on advertisement but also on climate change, fluctuation in marketing strategy and so on. We can use multiple regression to model more than one independent variables as:

where, β1, β2, and β3 are the coefficients, β0 is the intercept and 𝜺 is the random error component.

Thanks for Reading!

References:

--

--