Crash course in Forecasting — Time Series Forecasting Techniques, Tools, and Best Practices

Cibaca Khandelwal
AI Skunks
Published in
13 min readApr 10, 2023

The article “Crash course in Time Series Forecasting — Techniques, Tools, and Best Practices” provides a comprehensive guide to time series forecasting. The article covers various techniques, tools, and best practices that can be used to build accurate forecasting models.

Dataset used in this crash course is the Microsoft Stock Data from Kaggle that can be downloaded from below link

Microsoft_stock.csv

Table of Contents

  1. Introduction to Time Series Forecasting

2. Understanding Time Series Data

3. Time Series Forecasting Techniques

4. Evaluating Time Series Forecasts

5. Best Practices in Time Series Forecasting

6. Tools and Libraries for Time Series Forecasting

7. Conclusion and Future Directions

8. References

Introduction to Time Series Forecasting

What is Time Series Forecasting ?

Time series forecasting is a technique used to predict future values of a time-dependent variable based on its past values. It is a widely used technique in many domains, including finance, economics, weather forecasting, and engineering. In this article, we will provide an overview of time series forecasting, including its definition, importance, and applications.

  • Time series forecasting is the process of predicting future values of a time-dependent variable based on its past values.
  • Time-dependent variables are those that change over time and can be measured at regular intervals.
  • Examples of time-dependent variables include stock prices, temperature readings, and website traffic.

Importance of Time Series Forecasting

Industry Usage of Time Series Forecasting

Understanding Time Series Data

What is Time Series Data ?

Time series data refers to a type of data that is collected over time, with each observation corresponding to a specific point in time. Examples of time series data include stock prices, weather patterns, and economic indicators.

Types of Time Series Data

  1. Univariate Time Series Data — refers to time series data contain only one variable multivariate
  2. Multivariate Time Series Data — refers to time series data contain multiple variables

Components of Time Series Data

Time Series Forecasting Techniques

There are several time series forecasting techniques that we can use to predict future values of a time-dependent variable. Out of which we will be discussing 5 techniques as mentioned below

  1. Autoregressive Integrated Moving Average (ARIMA): ARIMA is a widely used technique for time series forecasting that models the relationship between the current value of a time series and its past values. It can handle both trend and seasonal patterns in the data.
  2. Exponential Smoothing (ETS): ETS is a family of methods that models the time series as a combination of trend, seasonality, and irregularity. It is a simple yet effective technique for short-term forecasting.
  3. Seasonal Autoregressive Integrated Moving Average (SARIMA): SARIMA is an extension of ARIMA that takes into account the seasonal patterns in the data. It is suitable for data that exhibit both trend and seasonal patterns.
  4. Prophet: Prophet is a time series forecasting framework developed by Facebook that uses a decomposable model to capture trend, seasonality, and holiday effects. It is designed to handle datasets with missing values and outliers.
  5. Long Short-Term Memory (LSTM): LSTM is a type of neural network that is well-suited for processing sequential data such as time series. It can handle complex patterns in the data and is often used for long-term forecasting.
  6. Vector Autoregression (VAR): VAR is a technique for modeling the relationships between multiple time series. It can capture the interdependencies between different variables and can be used for forecasting in various domains such as finance and economics.
  7. Gaussian Process Regression (GPR): GPR is a probabilistic technique that models the time series as a sample from a Gaussian process. It is suitable for datasets with a small number of observations and can handle non-linear relationships in the data.
  8. Seasonal Decomposition of Time Series (STL): STL is a technique that decomposes the time series into three components: trend, seasonal, and residual. It is useful for understanding the underlying patterns in the data and can be used for short-term forecasting.
  9. Dynamic Linear Models (DLM): DLM is a framework that models the time series as a combination of different components, including trend, seasonal, and regression effects. It is suitable for data with multiple variables and can handle missing data.
  10. Ensemble Methods: Ensemble methods involve combining multiple forecasting models to improve the accuracy of the forecasts. Examples include Bagging, Boosting, and Stacking.

Example with python code using ARIMA model on Microsoft Stock data

In this code, we first load the Microsoft Stock dataset using the read_csv function from the pandas library. We then convert the 'Date' column to a datetime object using the to_datetime function and set it as the index of the dataframe using the set_index function.

Next, we create a time series from the ‘Close’ column and create an ARIMA model with (p,d,q) = (2,1,2) using the ARIMA function from the statsmodels.tsa.arima_model library. We fit the model to the data using the fit method.

Finally, we make a prediction for the next 30 days using the forecast method and plot the historical and predicted data using the plot function from the matplotlib.pyplot library. We add labels to the x and y-axis using the xlabel and ylabel functions, respectively, and add a title to the plot using the title function. We also add a legend using the legend function and display the plot using the show function.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA

# Load the Bitcoin historical dataset from Kaggle
df = pd.read_csv('Microsoft_Stock.csv')

# Define the ARIMA parameters
p = 5 # AR parameter
d = 1 # Integrated parameter
q = 2 # MA parameter

# Fit the ARIMA model to the Bitcoin data
model = ARIMA(df['Close'], order=(p, d, q))
results = model.fit()

# Make predictions for the next 30 days
forecast = results.forecast(steps=30)

# Plot the Microsoft Stock data and the forecasted values
plt.plot(ts.index, ts, label='Microsoft Stock')
plt.plot(pd.date_range(start=ts.index[-1], periods=31, freq='D')[1:], forecast, label='Predicted Data')
plt.xlabel('Date')
plt.ylabel('Price (USD)')
plt.title('Microsoft Stock Price Prediction with ARIMA')
plt.legend()
plt.show()
output of above code

Example with python code using Exponential Smoothing model on Microsoft Stock data

In this code, we first load the Microsoft Stock dataset using the read_csv function from the pandas library. We then convert the 'Date' column to a datetime object using the to_datetime function and set it as the index of the dataframe using the set_index function.

Next, we create a time series from the ‘Close’ column and create an Exponential Smoothing (ES) model with trend='add', seasonal='add', and seasonal_periods=12 using the ExponentialSmoothing function from the statsmodels.tsa.holtwinters library. We fit the model to the data using the fit method.

Finally, we make a prediction for the next 30 days using the forecast method and plot the historical and predicted data using the plot function from the matplotlib.pyplot library. We add labels to the x and y-axis using the xlabel and ylabel functions, respectively, and add a title to the plot using the title function. We also add a legend using the legend function and display the plot using the show function.

Exponential Smoothing (ES) is a popular time series forecasting method that models the underlying trend and seasonality in the data using exponential functions. It is a simple and effective technique that can handle non-linear trends and seasonality in the data. The ExponentialSmoothing function from the statsmodels.tsa.holtwinters library allows us to easily create an ES model and fit it to the data. The fit method fits the model to the data and the forecast method allows us to make predictions for future time periods.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.holtwinters import ExponentialSmoothing

# Load the Microsoft Stock dataset
df = pd.read_csv('Microsoft_Stock.csv')

# Convert the 'Date' column to a datetime object
df['Date'] = pd.to_datetime(df['Date'])

# Set the 'Date' column as the index
df.set_index('Date', inplace=True)

# Create a series of the 'Close' column
ts = df['Close']

# Create the ES model with smoothing_level=0.6 and smoothing_slope=0.2
model = ExponentialSmoothing(ts, trend='add', seasonal='add', seasonal_periods=12)
model_fit = model.fit()

# Make a prediction for the next 30 days
forecast = model_fit.forecast(steps=30)

# Plot the historical and predicted data
plt.plot(ts.index, ts, label='Historical Data')
plt.plot(pd.date_range(start=ts.index[-1], periods=31, freq='D')[1:], forecast, label='Predicted Data')
plt.xlabel('Date')
plt.ylabel('Price (USD)')
plt.title('Microsoft Stock Price Prediction with Exponential Smoothing')
plt.legend()
plt.show()

Evaluating Time Series Forecasts

Once we have generated time series forecasts, we need to evaluate their accuracy. There are several metrics for evaluating time series forecasts, such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE). We can also use cross-validation techniques to evaluate the performance of our models on new data.

  1. Mean Absolute Error (MAE): The MAE is a commonly used metric to evaluate the accuracy of time series forecasts. It measures the average absolute difference between the predicted and actual values.
  2. Mean Squared Error (MSE): The MSE is another popular evaluation metric that measures the average squared difference between the predicted and actual values.
  3. Root Mean Squared Error (RMSE): The RMSE is a variation of the MSE that takes the square root of the average squared difference between the predicted and actual values.
  4. Mean Absolute Percentage Error (MAPE): The MAPE is a commonly used metric to evaluate the accuracy of time series forecasts as a percentage of the actual values.
  5. Symmetric Mean Absolute Percentage Error (SMAPE): The SMAPE is another popular evaluation metric that measures the percentage difference between the predicted and actual values.

Best Practices in Time Series Forecasting

To generate accurate time series forecasts, we need to follow best practices in time series forecasting. These practices include feature engineering, hyperparameter tuning, dealing with missing data and outliers, and ensembling multiple models.

Tools and Libraries for Time Series Forecasting

As the use of time series data becomes increasingly common across industries, there are now many tools available to help with the process of forecasting. In this article, we’ll take a look at some of the top tools available for time series forecasting.

  1. Python: Python is a popular language for data analysis and has a number of libraries that are widely used for time series forecasting. Some popular libraries include StatsModels, Prophet, and scikit-learn. Python provides an extensive range of tools for data preprocessing, model selection, and evaluation, making it a popular choice for time series forecasting.
  2. R: R is another popular language used in data analysis, and it also has a number of libraries specifically designed for time series forecasting. Some popular libraries include forecast, tseries, and zoo. R has been widely used for time series forecasting, and its user community is very active and supportive.
  3. Excel: Excel is widely used for business applications, and it has a number of built-in tools that can be used for time series forecasting. Excel provides a range of statistical functions that can be used to analyze time series data, including the ability to perform regression analysis and time series decomposition. Excel’s ease of use and familiarity make it a popular choice for basic time series forecasting tasks.
  4. Tableau: Tableau is a powerful data visualization tool that can also be used for time series forecasting. Tableau provides a range of visualization options for time series data, including line charts, scatter plots, and heatmaps. Tableau’s drag-and-drop interface and interactive features make it an excellent tool for exploring and visualizing time series data.
  5. MATLAB: MATLAB is a powerful computing environment that can be used for scientific computing, data analysis, and visualization. It has a range of built-in functions and toolboxes that can be used for time series forecasting, including the Signal Processing Toolbox and the Econometrics Toolbox. MATLAB provides a range of visualization options and can be used to build custom models for time series forecasting.
  6. SAS: SAS is a statistical software package widely used in the business world. It provides a range of functions for time series forecasting, including the ability to perform ARIMA modeling, exponential smoothing, and regression analysis. SAS is a powerful tool for forecasting and is widely used in industries such as finance and healthcare.
  7. Gretl: Gretl is an open-source software package that can be used for econometric analysis and time series forecasting. It provides a range of statistical functions and models, including ARIMA modeling and regression analysis. Gretl is a user-friendly tool and is a good choice for those who are new to time series forecasting.
  8. Prophet: Prophet is an open-source forecasting tool developed by Facebook. It uses a decomposable time series model to forecast time series data and has become popular due to its ease of use and ability to handle seasonality and non-linear trends.
  9. Keras: Keras is a popular deep learning library that can be used for time series forecasting. It provides a range of neural network models that can be used for time series forecasting, including Long Short-Term Memory (LSTM) models and Convolutional Neural Networks (CNNs). Keras is a powerful tool for forecasting and is widely used in industries such as finance and healthcare.
  10. TensorFlow: TensorFlow is an open-source machine learning library that can be used for time series forecasting. It provides a range of deep learning models that can be used for time series forecasting, including LSTM models and CNNs. TensorFlow is a powerful tool for forecasting and is widely used in industries such as finance and healthcare.

Conclusion

In this article, we discussed various aspects of time series forecasting, including its importance, benefits, applications, and techniques. We also looked at some popular tools and datasets used for time series forecasting and saw code snippets in Python for applying different forecasting techniques.

From our discussion, we can conclude that time series forecasting is a vital aspect of many industries and domains, including finance, retail, energy, and healthcare. It helps in making informed decisions, reducing risk, and improving the overall efficiency of businesses.

To perform time series forecasting effectively, it is essential to follow best practices such as cleaning and preprocessing data, selecting appropriate forecasting techniques, evaluating models with metrics, and continuously monitoring and updating models as necessary.

Various tools and libraries are available for time series forecasting, such as ARIMA, Prophet, and ES models, along with visualization tools like matplotlib and seaborn. These tools enable users to analyze and model time series data, make forecasts, and visualize the results.

However, while these tools provide great capabilities for time series forecasting, it is essential to have a sound understanding of the underlying techniques and concepts to use them effectively. Moreover, it is necessary to choose the right tool for the specific problem and data at hand, along with continuously monitoring and refining the models to maintain their accuracy.

In conclusion, time series forecasting is an important aspect of modern businesses and industries, and it can provide valuable insights and improve decision-making. With the right tools, techniques, and best practices, businesses can harness the power of time series forecasting to stay ahead of the curve and achieve their goals.

Future Scope

Time series forecasting has come a long way in recent years and has become an essential tool for various industries. With advancements in technology and machine learning, time series forecasting has become more accurate and efficient. However, there is still a lot of scope for improvement and development in this field.

One of the major areas of future development in time series forecasting is the integration of deep learning models. Deep learning models have shown great potential in various fields and can be applied to time series forecasting to improve accuracy and efficiency. Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) models have been found to be particularly effective in time series forecasting.

Another area of future development is the integration of external factors into the time series forecasting model. This involves incorporating external variables that may have an impact on the time series data, such as weather conditions, economic indicators, and social media trends. This can lead to more accurate predictions and better decision-making.

Furthermore, the integration of uncertainty estimates in time series forecasting can help in making informed decisions. With the use of probabilistic forecasting models, the level of uncertainty in the predictions can be estimated, allowing decision-makers to make informed decisions and take necessary actions.

Another area of future development is the development of explainable AI models for time series forecasting. Explainable AI models can help in understanding the underlying patterns and factors that are affecting the time series data. This can lead to more effective decision-making and better understanding of the predictions.

Moreover, with the growth of the Internet of Things (IoT), the amount of time series data being generated is increasing rapidly. This presents both opportunities and challenges for time series forecasting. The challenge is to develop models that can handle large volumes of data, while the opportunity lies in the potential for more accurate predictions and better decision-making.

In conclusion, time series forecasting is an essential tool for various industries, and there is a lot of scope for future development and improvement. With the integration of deep learning models, external factors, uncertainty estimates, explainable AI models, and the growth of IoT, the future of time series forecasting looks promising. These advancements can lead to more accurate predictions, better decision-making, and improved overall performance.

References

--

--

Cibaca Khandelwal
AI Skunks

Tech enthusiast at the nexus of Cloud ☁️, Software 💻, and Machine Learning 🤖, shaping innovation through code and algorithms.