Predicted vs. Actual Values of Inflation from an LSTM

6 Tips for Improving the Performance of LSTMs/BiLSTMs

Yousuf Mehmood

--

LSTMs (Long Short Term Memory) are types of neural networks usually used to predict financial data like sales, stock prices, etc. Tweaking their performance is usually a process of trial and error. However, varying the number of neurons per layer, epochs, batch size, etc. can become cumbersome if there’s no clear direction to work towards.

In my short experience with LSTMs though, it’s become apparent that there are a few basic rules if you want to get the best performance out of your LSTM. These tweaks also work for BiLSTMs.

These tips are based on a project I did to predict the rate of inflation based on interest rates, bank lending and offer rates, and oil prices.

How to Tweak the Performance of LSTMs

  1. The “Number of Neurons” should be proportional to the “Number of Features in the Data Set”

Your LSTM should always have neurons proportional to the number of features in your data set.

This is not to say that if your data set has 30 features, you should use 30 neurons. It just means, the more features you have, the more neurons you’ll require to effectively reduce the loss/error in your neural network.

Mean Squared Error from an LSTM with 75 Neurons

This is a trial and error process to figure out. However, once you have some semblance of how many neurons will give you a low loss value, you can tweak that value.

Note: For BiLSTMs, a smaller number of neurons usually do the trick in comparison to vanilla LSTMs.

Mean Squared Error from a 2-Layer BiLSTM with 14 Neurons per layer

2. Increasing “Timesteps” give diminishing returns

Increasing the number of timesteps or lagging features to predict your label will work up to a point.

Mean Squared Error from LSTM with 8 Timesteps

You should use a basic LSTM each time you increase your timesteps to set a baseline. If increasing timesteps gives you a higher loss than your previous calculations, then you shouldn’t proceed forward.

Mean Squared Error from LSTM with 10 Timesteps (Higher MSE value)

Note: As with the Number of Neurons, for BiLSTMs a smaller number of timesteps or lags do the trick in comparison to vanilla LSTMs.

Mean Squared Error from BiLSTM with 6 Timesteps + 50 Epochs

3. Increasing the “Number of Epochs” should be a last resort

Remember that raising the number of epochs is almost always NOT a viable tweak. High loss values come as a result of badly optimized neural networks, bad data, or insufficient computing resources.

Increasing the number of epochs can only serve to prolong the compute process if your approach is flawed.

If your algorithm is configured correctly, then chances are reducing the number of epochs is in your best interest. The right algorithm maybe able to give you the lowest loss values in 20–25 epochs.

HOWEVER, increasing the number of epochs is viable in case loss values are continuously going down, and your algorithm stops before it has reached the optimal “stopping epoch”.

Mean Squared Error from BiLSTM with 6 Timesteps + 100 Epochs

4. Vary “Dropout” Between 0.2 and 0.5

Dropout is a feature of neural networks which has served me well in nearly all experiments. It is a tweak which you should apply to your best model to push your results towards greater accuracy and precision.

Mean Squared Error from BiLSTM with 6 Timesteps + 100 Epochs + Dropout = 0.2

5. Use Early Stopping

Aside from Dropout, Early Stopping is another tweak which can:

a) Give you better results

b) Save computing resources

Use early stopping to monitor one specific loss metric (preferably Root Mean Squared Error, RMSE in the case of LSTMs) to tune your algorithm for greater accuracy and precision.

Mean Squared Error from BiLSTM with 6 Timesteps + 100 Epochs + Dropout = 0.2 + EarlyStopping

6. Also vary Batch Size and use Batch Normalization (Optional)

While it’s not been my experience that varying Batch Size has any great impact on the performance of an LSTM, neural networks as a rule work better when batch size is optimized.

Using Batch Normalization has resulted in an improvement in algorithm performance in my experience, however.

--

--

Yousuf Mehmood

Burgeoning Data Scientist with Power Level Over 9000. I have not yet begun to compile!