On the Approach to Deep Learning for Time Series Problem

Published in

Aviation Software Innovation

5 min readSep 19, 2020

Goal: This blog aims to give the readers a general approach of using deep learning to solve the time-series problems.

Convolutional Neural Network architecture. Source

At ASI, we are working on building the management system for the UN in which we are developing the time-series solution to improve their analysis capability. In recent years, with the development of deep learning, many techniques are applied to solve the time-series problems. In this blog, we will inspect three different types of models: Recurrent Neural Network (RNN)-based models, Convolutional Neural Network (CNN) models, and Attention-based models.

I. RNN-based Models

RNN-based family models are one of the most classical approaches in applying deep learning for time-series. The reason is that the current state of the network is based on the result from the last time step. See the picture below for the illustration.

Note: Although we may see h0, h1, h2, etc., the architecture of h is similar across the time step. Besides, x1, x2, etc. are the inputs to the network correspond to the time.

However, the problem with vanilla RNN is the vanishing gradient problem. In the brief sentence, the vanishing gradient problem means that the gradient is used to update the weight for the network converge to zero. Therefore, the network will not be able to learn the dependencies between long and short-term events.

Besides the vanilla RNN (architecture above), we also have a Long-short term memory network (LSTM), Gated Recurrent Unit (GRU), etc. The idea of LSTM and GRU is to establish the new gates in either LSTM [1] or GRU [2] cell to control the information follow between states of the network.

Although LSTM is claimed to solve the vanishing gradient problem, it is usually difficult to train and take longer time to converge to the local minimum. Also, it also takes someone lots of effort to use trick and expert to control the value for the weights of RNN-based models to make the train process more efficient and effective.

II. CNN-based Models

The CNN-based approach is inspired by the computer vision communities such as ResNet [3], etc. In recent years, this approach has proven the effectiveness in terms of training and the state of the art for the performance compared to the RNN-based model. Examples that can be considered here are WaveNet [4] for automatic speech recognition or in ConvTimeNet [5], the authors argue that they achieved the better performance on the UCR dataset compared to TimeNet [6] (a LSTM-based model).

The power of CNN is from the strong resemblance between past and current information when we stride the filter through the time series. Therefore, it can capture the relationship in the information. However, we need to keep in mind that there are two limitations about using CNN in time series problems. First, we are assuming that the relationship between past and current events are time-invariant, which attention-based models are superior at dealing with. Second, the receptive fields size k or the kernel size of the sliding window is needed to tune carefully to achieve the best performance [7]. We will have another blog about how to make the hyperparameters fine-tune process less painful.

With normal CNN, we may suffer when dealing with long-term dependencies. Dilated Convolution is proposed to solve this question. In the simple term, dilated convolution is instead of striding the receptive field with k (kernel size) consecutive elements, we can stride k elements with the skip between every 2 elements.

Illustration for dilated convolution. Source

III. Attention-based Models

Transformers is the work that changes the state of the art in Natural Language Process (NLP) tasks. The idea of Transformers is based on attention mechanisms which assign different (attention) scores for different time step in the time series. The recent works have shown that Attention-based models, especially Transformers-based models, out-perform RNN-based models.

The power of Transformers-based models come from the combination of the advantage of CNN and RNN. At every time-step, Transformers will gather all the past time steps and calculate the self-attention to assign the attention scores for either each event or time block in the time series. This is similar to the dilated convolution block of CNN where we learn the long-term dependencies through considering a wider range of information. Since the Transformers can consider a wider range of time step, it may be able to learn the time-invariant like RNN-based model. In addition, with multi-attention heads, Transformers can figure out different types of dependencies such as the shopping event that may be affected by holiday, etc.

IV. Conclusion

We have walked you through a general approach to use deep learning in solving time series problems. We know that this blog may be overwhelming for some readers who do not have a solid background in the field. Therefore, we will publish more blogs in-depth about each of the approach. Eventually, we will show our readers why and how time series is used at ASI.

On the Approach to Deep Learning for Time Series Problem

I. RNN-based Models

II. CNN-based Models

III. Attention-based Models

IV. Conclusion

Written by Kien Hao Tiet