LSTM — A Time Series Analysis

Trevor O'Hearn
3 min readAug 2, 2020

--

Neural networks can be a hard concept to wrap your head around. I think this is mostly due to the fact that they can be used for so many different things such as classification, identification or just simply regression.

In this article, we will look at how easy it is to set up a simple LSTM model. All you need is your helpful friend KERAS and some array of numbers to throw into it.

First thing we always do? Import!

import math
import pandas as pd
import numpy as np
#keras models
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
#scaler
from sklearn.preprocessing import MinMaxScaler
#analysis tool
from sklearn.metrics import mean_squared_error

Next we need some set of numbers to play with. There are a couple ways you can go about doing this, but the best way is to use some data that actually has meaning and to do that I recommend going to Kaggle.

Once you have your .csv file downloaded we need to put it into a dataframe and only take the feature that has all the values we want to play with. Mine is the first column.

df = pd.read_csv('test_df_w_timeshift.csv', usecols=[1])
dataset = df.values
#normalize dataset
scaler = MinMaxScaler(feature_range=(0,1))
dataset = scaler.fit_transform(dataset)

We pull the values from the file and normalize them using the MinMaxScaler from sklearn.

The next thing to do is to separate the data into two groups, the first is a set a training data for our LSTM model to learn from. I like to use about eighty percent of the data to train to, however you can play with this number to see how much of the data is actually needed to train with before the model gives you a satisfactory result.

#split into train and test sets
train_size = int(len(dataset) * 0.80)
test_size = len(dataset) - train_size
train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]

The next thing we do to the data is create a secondary feature, if you will, from that data that is basically how far back we wish to look to see how much the current value has changed from the value before, lets say, three values ago.

This create_dataset method was written by Jason Brownlee, it is rewritten below, and a link to his LSTM time series model is at the bottom. I recommend checking it out!

#https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/
def create_dataset(dataset, lookback=1):
dataX, dataY = [], []
for i in range(len(dataset) - lookback - 1):
a = dataset[i: i + lookback, 0]
dataX.append(a)
dataY.append(dataset[i + lookback, 0])
return np.array(dataX), np.array(dataY)

So we use the method above to add a secondary column to be analyzed by the LSTM model and create a ‘X’ and a ‘Y’ for both the training data and the testing data.

# reshape into X=t and Y=t+1
look_back = 3
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back
# reshape input to be [samples, time steps, features]
trainX = np.reshape(trainX, (trainX.shape[0], trainX.shape[1], 1))
testX = np.reshape(testX, (testX.shape[0], testX.shape[1], 1)

Once the data is all set up, we simply add it to the model to be analyzed.

batch_size = 1
model = Sequential()
model.add(LSTM(4, batch_input_shape=(batch_size, look_back, 1), stateful=True, return_sequences=True))
model.add(LSTM(4, batch_input_shape=(batch_size, look_back, 1), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
for i in range(5):
model.fit(trainX, trainY, epochs=1, batch_size=batch_size, verbose=2, shuffle=False)
model.reset_states()

Once the model has run a set number of times, in this case its five, we can ask the model to predict.

trainPredict = model.predict(trainX, batch_size=batch_size)
model.reset_states()
testPredict = model.predict(testX, batch_size=batch_size)

Don’t forget to reverse our transformation from the beginning!

trainPredict = scaler.inverse_transform(trainPredict)
trainY = scaler.inverse_transform([trainY])
testPredict = scaler.inverse_transform(testPredict)
testY = scaler.inverse_transform([testY])

To see how well your model did you can use mean squared error below.

trainScore = math.sqrt(mean_squared_error(trainY[0], trainPredict[:,0]))

If you wish to see it visually, feel free to plot it using matplotlib.

import matplotlib as plt
plt.plot(scaler.inverse_transform(dataset))
plt.plot(trainPredictPlot)
plt.plot(testPredictPlot)
plt.show()

Resources:

--

--