Time-Series Forecasting using Linear Regression

Michael Okoro
3 min readAug 3, 2022

--

Forecasting is one of the most important concepts in data analysis and machine learning. It is a statistical technique that involves the estimation of one or more quantities in relation to other quantities or factors. In the case of time-series forecasting, we are examining the progression of a quantity in relation to time. A good application of this can be seen in financial forecasting where the Gross Domestic Product of a country is estimated with the help of the past data values each year, quarter, and so on.

It uses past data values to make predictions for the future. There are many different techniques or algorithms that do this, but in this post, we will be analyzing the linear regression technique as in most cases, it is the simplest and most introductory time-series forecasting technique in the field.

Linear regression is a statistical technique that uses a linear approach to modeling the relationship between a scalar quantity and an explanatory variable. If there is one explanatory variable, we can refer to this as simple linear regression while multiple explanatory variables can be described as multiple linear regression.

An example of this scalar response and exploratory variable relationship can be seen in a relationship such as test scores and grade awarded where the grade awarded is an exploratory variable of the test scores. In time-series forecasting, we can see this in the relationship between time and GDP where GDP is an exploratory variable of time.

You can picture linear regression as a graphical approach. You map the points using one quantity as the y-axis and the other quantity usually time (in the case of time-series forecasting) as the x-axis. After mapping and marking the points. A straight line is then drawn to connect the dots by using the line of best fit. This same line will be used to map out future occurrences of the graph by means of an extension of the line.

Graphical Representation of the Population of a Town between 2000 and 2022 and the line of best fit

In the graph above, we are trying to find the population of Town X in 2025 given the population values between 2000 and 2022, the blue line represents the line of best fit between 2000 and 2022. The orange line represents the extended line we have plotted in order to predict future values. By tracing the graph to 2025 using the green line, we can see that this corresponds to a population of 773 people. This means that Town X is expected to have 773 occupants in 2025.

Linear Regression, although basic and relatively easy to learn, is a very important technique in statistical analysis and prediction. It can be used as a forecasting mechanism to properly plan, estimate, and predict the trajectories of values which facilitates proper planning.

--

--

Michael Okoro

Exploring the Worlds of Data Science and Product Development/Management.