From scratch — An LSTM model to predict commodity prices

Vinay Arun
8 min readJan 7, 2018

--

Image by François Deloche, via Wikimedia Commons

I am neither a data scientist nor a programmer. However, driven by curiosity and the amount of resources available online, I embarked on a mini-project to build a machine learning system that can predict a commodity’s price at some future time period. The particular case being - Brent crude prices for the next month.

This article is an explanation of how I went about the process and the model that was finally built.

How can you use this article? I hope it serves as an introductory guide for you to work with LSTM models. Why are LSTMs important — because they are the practical way of implementing Recurrent Neural Networks (RNNs) and RNNs hold a lot of promise as explained in this wonderful article by Andrej Karpathy. You may also use this code on a new dataset for your own application. I would be happy if it is of any help to the reader.

I have uploaded the code and the dataset used for this project in a Github repository. Do go through the dataset which is a .csv file so that you understand this example better.

If you need more information on these concepts, I am listing below the resources that helped me a lot and have also served as inspiration for me. You can go through them for in depth understanding.

  1. Frank Kane’s book on Python programming
  2. Dr. Jason Brownlee’s website
  3. Siraj Raval’s youtube channel

As previously mentioned, I am not a data scientist or a programmer, however I had completed the very popular Andrew Ng Coursera course two years back and have a basic understanding of machine learning. Also as an engineer I have learnt C programming in the past and can understand and do programming. In my opinion though, even without past formal training in machine learning you can build this system I am describing in this article. Of course some basic coding skills are helpful for such a project.

My idea for this project was pretty straightforward. The prices of commodities are obviously linked to the global economy in general and of course supply-demand dynamics. Both these aspects are reflected in the behavior of price movements of commodities. So, there must be a relationship between the past price movements of Brent crude and other commodities with the future price of Brent over a short horizon, like one month.

Brent price for next month = f(Price trend of Brent crude and other commodities up till this month)

Thus I explored, using this data available to the current month how well can the price for the next month be forecast.

On doing some research and reading online it was evident that most such implementations are being done on Python - the programming language. For those who are new to all this, Python is an open source programming language, very popular in the machine learning universe. It is a high level language, i.e, there are many libraries available from which you can use functions already written for complex tasks, which means you can focus more on your actual end goal. This is excellent for business managers who are more interested in quickly getting business value than in getting entangled in time consuming for loops!

Let us go through the Python code segment by segment and understand how this is implemented. If you want to replicate this, simply copy pasting the code will work.

# load required libraries
from numpy import concatenate
from pandas import read_csv
from pandas import DataFrame
from pandas import concat
from matplotlib import pyplot
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
import numpy as np

The above segment starts with importing the libraries that will be used in the rest of the code. “numpy” and “pandas” are frequently used Python libraries for mathematical calculations. “matplotlib” is a library for plotting graphs. “sklearn” is again a library for mathematical functions. “keras” is a machine learning library and which is key for this project. The “keras” library is what we use to create the LSTM model and train it.

Next we define certain global values that will be used by the code.

num_features = 53 #Number of features in the dataset
lag_steps = 1 #Number of lagged time features to be generated
label_feature = ‘POILBRE’ #The column in dataset that model is being built to predict

Here the number of features are nothing but the number of columns in the dataset. Each column is a commodity with its prices in the rows, there are 54 columns in the dataset but the first column is just the month and year data. The lag_steps is the number of months of history that we want to be used as input for the prediction model, here we are using one month back. The label_feature is the commodity whose price we want to predict, in this case the price of Brent oil.

Next we define a function that we use to prepare our dataset using the number of lag_steps we have set.

# This function arranges the dataset to be used for surpervised learning by shifting the input values of features by the number
# time steps given in lag_steps

def sequential_to_supervised(data, lag_steps = 1, n_out = 1, dropnan = True):
features = 1 if type(data) is list else data.shape[1] # Get the number of features in dataset
df = DataFrame(data)
cols = list()
feature_names = list()

for i in range(lag_steps, 0, -1):
cols.append(df.shift(i)) # This will be the shifted dataset
feature_names += [(str(df.columns[j])) + ‘(t-%d)’ % (i) for j in range(features)] # Names of the shifted features

for i in range(0, n_out):
cols.append(df.shift(-i))
if i == 0:
feature_names += [(str(df.columns[j])) + ‘(t)’ for j in range(features)] # Names of the shifted features
else:
feature_names += [(str(df.columns[j])) + ‘(t+%d)’ % (i) for j in range(features)] # Names of the shifted features

agg = concat(cols, axis=1)
agg.columns = feature_names

if dropnan:
agg.dropna(inplace=True)
return agg

Next we read in the .csv file containing our dataset, convert it using the function defined above and also scale the data so that all columns are with values 0 to 1, this is important to train the model. We move the label column that we would like to predict to the end of the dataset.

# Reading in the dataset which is in .csv format, has column headings and has an index column
dataset = read_csv(“Dataset.csv”, header = 0, index_col = 0, squeeze = True, usecols = (i for i in range(0, num_features+1)))
supervised_dataset = sequential_to_supervised(dataset, lag_steps)

# Move label column to the end of dataset
cols_at_end = [label_feature + ‘(t)’]
supervised_dataset = supervised_dataset[[c for c in supervised_dataset if c not in cols_at_end] + [c for c in cols_at_end if c in supervised_dataset]]

# Dropping the current timestep columns of features other than the one being predicted, which will be the label or y
supervised_dataset.drop(supervised_dataset.columns[(num_features*lag_steps) : (num_features*lag_steps + num_features -1)], axis=1, inplace=True)
#print(supervised_dataset.shape) # Used for debugging
scaler = MinMaxScaler(feature_range=(0, 1))
supervised_dataset_scaled = scaler.fit_transform(supervised_dataset) # Scaling all values

Then we split the dataset in a 80:20 ratio. The first 80% of the data will be used for training the LSTM model and the remaining 20% for testing and validating the trained model. Reshaping is carried out because the LSTM model requires input data in 3D format.

split = int(supervised_dataset_scaled.shape[0]*0.8) # Splitting for traning and testing
train = supervised_dataset_scaled[:split, :]
test = supervised_dataset_scaled[split:, :]

train_X, train_y = train[:, :-1], train[:, -1] # The label column is separated out
test_X, test_y = test[:, :-1], test[:, -1]

train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1])) # Reshaping done for LSTM as it needs 3D input
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))

Now we come to the part of defining the LSTM network and training it. In machine learning training works by randomly initiating the model and calculating the loss from the model’s prediction, this loss is then minimized by updating the weight of the model using a method called gradient descent — which basically finds the minimum loss model. Here we use mean squared error to calculate loss and an optimizer called ‘adam’. The model trained and a graph of the training is displayed.

# Defining the LSTM model to be fit
model = Sequential()
model.add(LSTM(85, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(1))
model.compile(loss=’mean_squared_error’, optimizer=’adam’)

# Fitting the model
history = model.fit(train_X, train_y, epochs=200, batch_size=175, validation_data=(test_X, test_y), verbose=2, shuffle=False)
# Plotting the training progression
pyplot.plot(history.history[‘loss’], label=’train’)
pyplot.plot(history.history[‘val_loss’], label=’test’)
pyplot.legend()
pyplot.show()

In the above segment you can observe certain parameters of the model being defined. The LSTM layer has 85 cells and a single cell in the output layer. The model is trained for 200 epochs with a batch size of 175. The more the epochs the more the model will be trained for, however it may start over-fitting the training dataset and will be inaccurate with the test dataset. One has to find the right set of parameters which depends on the model and dataset in question, it is also more of an art than exact science involving a lot of trial and error.

Plot of training errors over epochs

Finally, with the trained model we make a prediction using the features in the testing dataset and compare the predictions against the actual values. The root mean square error is calculated and displayed.

# Using the trained model to predict the label values in test dataset
yhat = model.predict(test_X)

# Reshaping back into 2D for inversing the scaling
test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))

# Concatenating the predict label column with Test data input features, needed for inversing the scaling
inv_yhat = concatenate((test_X[:, 0:], yhat), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat) # Rescaling back

inv_yhat = inv_yhat[:, num_features*lag_steps] # Extracting the rescaled predicted label column

test_y = test_y.reshape((len(test_y), 1))
inv_y = concatenate((test_X[:, 0:], test_y), axis=1) # Re joing the test dataset for inversing the scaling
inv_y = scaler.inverse_transform(inv_y) # Rescaling the actual label column values
inv_y = inv_y[:, num_features*lag_steps] # Extracting the rescaled actual label column

rmse = np.sqrt(mean_squared_error(inv_y, inv_yhat)) # Calculating RMSE
print(‘Test RMSE: %.3f’ % rmse)

pyplot.plot(inv_y, label = ‘Actual’)
pyplot.plot(inv_yhat, label = ‘Predicted’)
pyplot.legend()
pyplot.show()

Actual vs predicted for next month’s Brent crude price

Is it a good enough prediction? You may be able to tune the model better by adjusting the hyperparameters. It is also good practice to evaluate other predictive models, for this particular case simple linear regression could also be effective.

I hope this article has been of help, comments are welcome! You can find out more about me at LinkedIn.

--

--

Vinay Arun

A supply chain professional with curiosity in data science and machine learning. https://in.linkedin.com/in/vinayarun