How to build LSTM neural networks in Keras

David Lengacher
3 min readFeb 10, 2018


There is some confusion about how LSTM models differ from MLPs, both in input requirements and in performance. One way to become more comfortable with LSTM models is to generate a data set that contains some lagged components, then build both a LSTM and regular MLPs model to compare their performance and function.

To see my other LSTM article on using lagged variables:

First we generate the uni-dimensional input that both models will need.

#Load Packages
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Activation
#Generate 2 sets of X variables
#LSTMs have unique 3-dimensional input requirements
X =[[i+j for j in range(seq_length)] for i in range(100)]
X_simple =[[i for i in range(4,104)]]
X =np.array(X)

Here is the LSTM-ready array with a shape of (100 samples, 5 time steps, 1 feature)

And the MLP-ready array has a shape of (100 samples, 1 feature). Note the key differnece is the lack of time steps or sequence.

Next generate a simple lagged y-variable.

y =[[ i+(i-1)*.5+(i-2)*.2+(i-3)*.1 for i in range(4,104)]]
y =np.array(y)

This is what the y-array looks like.

So now we can see how the LSTM model is trying to find a pattern from the sequence [0, 1, 2, 3, 4, 5] to → 6, while the MLP is only focused on a pattern from [4] to → 6.

Next we build the LSTM model.

model = Sequential()
model.add(LSTM(8,input_shape=(5,1),return_sequences=False))#True = many to many
model.compile(loss=’mse’,optimizer =’adam’,metrics=[‘accuracy’]),y,epochs=2000,batch_size=5,validation_split=0.05,verbose=0);
scores = model.evaluate(X,y,verbose=1,batch_size=5)
print(‘Accurracy: {}’.format(scores[1]))
import matplotlib.pyplot as plt
plt.plot(y, predict-y, 'C2')
plt.ylim(ymax = 3, ymin = -3)

Here we can see the LSTM model doing a fairly good job at prediction until the upper range. Normalization should address this.

And now for the MLP model with near-identical parameters.

model2 = Sequential()
model2.add(Dense(8, input_dim=1, activation= ‘linear’ ))
model2.add(Dense(2, activation= ‘linear’ ))
model2.add(Dense(1, activation= ‘linear’ ))
scores2 = model2.evaluate(X_simple,y,verbose=1,batch_size=5)
print(‘Accurracy: {}’.format(scores2[1]))

The MLP model virtually perfect. This is why they call them universal function approximators!

The likely reason why the MLP outperformed the LSTM is because of lag component only spanned 3 time steps. Some sources have stated that when the relationships span longer time frames, LSTMs will tend to perform best.



David Lengacher

Data Science leader focused on performance measurement, process improvement, resource optimization, and low hanging analytical fruit.