How to build LSTM neural networks in Keras

David Lengacher
3 min readFeb 10, 2018

--

There is some confusion about how LSTM models differ from MLPs, both in input requirements and in performance. One way to become more comfortable with LSTM models is to generate a data set that contains some lagged components, then build both a LSTM and regular MLPs model to compare their performance and function.

To see my other LSTM article on using lagged variables: https://medium.com/@dclengacher/lstms-with-lagged-data-cc03a3a8cfcf

First we generate the uni-dimensional input that both models will need.

#Load Packages
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Activation
#Generate 2 sets of X variables
#LSTMs have unique 3-dimensional input requirements
seq_length=5
X =[[i+j for j in range(seq_length)] for i in range(100)]
X_simple =[[i for i in range(4,104)]]
X =np.array(X)
X_simple=np.array(X_simple)

Here is the LSTM-ready array with a shape of (100 samples, 5 time steps, 1 feature)

And the MLP-ready array has a shape of (100 samples, 1 feature). Note the key differnece is the lack of time steps or sequence.

Next generate a simple lagged y-variable.

y =[[ i+(i-1)*.5+(i-2)*.2+(i-3)*.1 for i in range(4,104)]]
y =np.array(y)
X_simple=X_simple.reshape((100,1))
X=X.reshape((100,5,1))
y=y.reshape((100,1))

This is what the y-array looks like.

So now we can see how the LSTM model is trying to find a pattern from the sequence [0, 1, 2, 3, 4, 5] to → 6, while the MLP is only focused on a pattern from [4] to → 6.

Next we build the LSTM model.

model = Sequential()
model.add(LSTM(8,input_shape=(5,1),return_sequences=False))#True = many to many
model.add(Dense(2,kernel_initializer=’normal’,activation=’linear’))
model.add(Dense(1,kernel_initializer=’normal’,activation=’linear’))
model.compile(loss=’mse’,optimizer =’adam’,metrics=[‘accuracy’])
model.fit(X,y,epochs=2000,batch_size=5,validation_split=0.05,verbose=0);
scores = model.evaluate(X,y,verbose=1,batch_size=5)
print(‘Accurracy: {}’.format(scores[1]))
import matplotlib.pyplot as plt
predict=model.predict(X)
plt.plot(y, predict-y, 'C2')
plt.ylim(ymax = 3, ymin = -3)
plt.show()

Here we can see the LSTM model doing a fairly good job at prediction until the upper range. Normalization should address this.

And now for the MLP model with near-identical parameters.

model2 = Sequential()
model2.add(Dense(8, input_dim=1, activation= ‘linear’ ))
model2.add(Dense(2, activation= ‘linear’ ))
model2.add(Dense(1, activation= ‘linear’ ))
model2.compile(loss=’mse’,optimizer=’rmsprop’,metrics=[‘accuracy’])
model2.fit(X_simple,y,epochs=2000,batch_size=5,validation_split=0.05,verbose=0);
scores2 = model2.evaluate(X_simple,y,verbose=1,batch_size=5)
print(‘Accurracy: {}’.format(scores2[1]))

The MLP model virtually perfect. This is why they call them universal function approximators!

The likely reason why the MLP outperformed the LSTM is because of lag component only spanned 3 time steps. Some sources have stated that when the relationships span longer time frames, LSTMs will tend to perform best.

--

--

David Lengacher

Data Science leader focused on performance measurement, process improvement, resource optimization, and low hanging analytical fruit.