Selecting LSTM Timesteps

Caner
2 min readMar 25, 2020

--

Selecting an optimal value for timesteps especially for LSTM models is another very important hyperparameter besides the batch size which I have explained here. In this article, I will describe why do we need and how to select the timesteps hyperparameter while developing LSTM models.

First of all, please note that a timestamp is the same as the timestamped sample. Let’s consider the figure 1 below. The blue rectangle is known as a rolling/sliding window which is divided into two equal parts, with a red vertical dash line, throughout the center.

  • the midpoint is representing the current time t
  • the right edge is representing time t+timesteps
  • and finally, the left edge is representing-time t-timesteps

And within every sliding window, recall that if we assume the timesteps is 10 then, the LSTM has learned from 10 timesteps and has attempted to predict the next 10 timesteps in the future, the whole sliding window slides one timestep to the right, and again, the whole procedure restarts.

Figure 1: Sliding window

As illustrated in figure 2 below; when you strigth the window, in other words when you move the window towards the future in time for the next phase of the prediction, you move the window’s center to the right as much as one timestep value e.g. 10. Here the LSTM model is learning from the data, from the 10 timesteps which are allocated to the left side of the red centered line within the rolling window, to predict the data which is located to the right of this new red vertical line within the new rolling window. Also please note that by default the timestep is declared as 1 in the LSTM model so we need to declare to the desired value e.g. 10.

so at the next timestep the window’s

  • new midpoint will be current time t+timesteps
  • new right edge is on-time t+timesteps+timesteps= t+2*timesteps
  • new left edge is on-time t-timesteps+timesteps = t
Figure 2

Since we move the window center 10 timestamps at a time, this is known as discrete-time predictions based on the discrete-time inputs.

--

--