Intermittent Time Series Forecasting

Karthikeswaren R
CodeX
Published in
5 min readApr 5, 2022

We surveyed various methods that can be used to forecast intermittent time series. Our work got published and accepted at the International Joint Conference on Neural Networks (IJCNN) 2021. This article is a short summary of our survey paper.

Here is the link to our paper: https://ieeexplore.ieee.org/document/9533963

In a given time series if non-zero values occur only in an intermittent fashion then it can be considered an intermittent time series.

We can witness such series occurring in the manufacturing sector. For instance, a manufacturing company might not receive orders for every product every week. In fact, for every week in which demand occurs for a certain product, it is usually followed by multiple weeks with no demand before the next demand occurs for it. These demand intervals are marked by 0’s. This can be observed in the Parts dataset and we will be using a couple of series from it for this article.

There are multiple models that are designed to fit an intermittent time series. Croston, DeepAR, etc. are examples of such models. Also, one can forecast using regression and classification models coupled together.

Exponential Smoothing

While it is possible to use methods such as exponential smoothing to fit an intermittent time series, it assumes that the given time series is continuous.

Exponential Smoothing Equation

This can be problematic when trying to fit an intermittent time series. In this code we are trying to predict the final 8 time steps using the first 42 steps:

library(tsintermittent)
exp_model <- sexsm(data, h=8, outplot=TRUE)
Exponential Smoothing Used on an Intermittent Time Series. The thicker red line represents exponential smoothing’s average predicted value.

We get an absolute sum error of 0.52 using exponential smoothing.

Croston Model

Croston model overcomes this issue by fitting two exponential smoothing models on the given time series. The given time series is separated as demand (non-zero values) and the interval between them. Each series is fitted with an exponential smoothing model.

d denotes demand and i denotes the interval between the demand occurrences

The ratio between predicted demand and interval is returned as the expected demand per time step.

Expected demand per time step

As we can observe, the absolute sum error is 0.16 for this solution. This is much lesser when compared to that of exponential smoothing.

croston_model <- croston(data, h=8,outplot=TRUE)
Croston model’s Solution. The thicker red line represents the average predicted value for this method.

Teunter, Syntetos and Babai (TSB) Model

Despite this improvement in results, the Croston model can be further modified to get a better outcome. Croston model gets updated only during the occurrence of a demand. This can be an issue when the model has to encounter a long interval.

TSB Model uses the probability of demand occurrence in the series instead of the interval series. The update equations for this model during a demand occurrence are given below:

TSB update equations during demand occurrence

Even when there is no demand occurrence, the equations are updated. This is unlike the Croston model.

TSB Model update equations during demand intervals

We can notice in the below figure that the TSB model updates the parameters even during demand intervals.

These two charts are from Croston and TSB Models from left to right.

DeepAR

In recent times, deep learning is also being used to forecast intermittent time series. For example, models like DeepAR have successfully incorporated deep learning to give good results.

The Architecture of DeepAR

DeepAR’s architecture tries to model the time series as a negative binomial distribution. In most time-series datasets including the Parts dataset, the demand is only positive so a negative binomial distribution is a suitable option. In addition, this architecture can handle any N time series of the same length together. The objective of this architecture is to maximize the log-likelihood of the distribution.

The log-likelihood function of DeepAR for the N series is optimized by tweaking parameters θ which uses hidden layer values h.

To conclude, there is no one model that can outperform all the rest. It will depend on the kind of dataset we have at hand. For example, classical methods like Croston can perform well for shorter datasets. Experimenting on a given dataset using multiple methods is essential to get the best solution.

This article only contains an overview of currently existing methods used for intermittent time series forecasting. There are also other ways to approach this problem.

This paper (link) was published by Mastercard’s AI Garage team. It contains extensive explanations about other methods and also includes observations from experimenting on them.

Thank you for reading! Before you go, feel free to connect with me and follow our team on LinkedIn:

--

--