LSTMs with Lagged Data

Should you use lagged data when building an LSTM?

When building your first LSTM, you will quickly realize that your input data must be in the form of a 3-dimensional array. The three dimensions are:

  • Samples
  • Time Steps (or window size)
  • Features

The potentially confusing part for modelers is the Time Steps element. Most modelers are used to a simple 2-dimensional array (samples, features). No worries, because there are plenty of tutorials showing how to reshape your input data to meet these requirements. Here is one of my favorites https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/

Once you have the input shaping figured out, you may ask yourself…

Do I need to create lag variables?

Most tutorials and examples online do not include this step. My hypothesis for this is that because there is already a time step dimension to the input array, most modelers feel they simply don’t need lag variables. After all, one of the unique values of an LSTM is the ability to find patterns in the time step dimension!

Despite this intuition, I have found that including lagged features produces superior results. Here is one example.

I created two versions of the input array. One with base features (bottom) and one with an additional seven lag features. The most important comparison in this example is that the best performance of the dataframe without lags is 0.046 can never match the best performance of the one with lags. After this experiment I tried additional hyperparameter setups and continued to get superior performance with added lag features.

The reason for this may be intuitive. Although the individual values in the lag features are duplicative, they are housed in vectors that can be uniquely weighted, thus providing the potential for unique contribution.