Neural networks for algorithmic trading. Multivariate time series

Alexandr Honchar
Jun 6, 2017 · 6 min read
Image for post
Image for post
Illustration of 2-variate time series

In previous post we discussed several ways to forecast financial time series: how to normalize data, make prediction in the form of real value or binary variable and how to deal with overfitting on highly noisy data. But what we skipped (on purpose) — is that our .csv file with prices basically has much more data that we may use. In last post only close prices with some transformation were used, but what can happen if we will consider also high, low, open prices and volume of every historical day? This leads us to working with multidimensional, e.g. multivariate time series, where on every time stamp we have more than just one variable — in our case we will work with whole OHLCV tuple.

Other posts:

  1. Correct 1D time series forecasting + backtesting
  2. Multivariate time series forecasting
  3. Volatility forecasting and custom losses
  4. Multitask and multimodal learning
  5. Hyperparameters optimization
  6. Enhancing classical strategies with neural nets
  7. Probabilistic programming and Pyro forecasts

In this article we will see how to preprocess multivariate time series, in particular, what to do with every dimension, how to define and train a neural network on this kind of data and will compare results with what we had in last post.

As always, you can jump directly to the code.

Data preparation

Image for post
Image for post
Image channels and dimensions

In case of time series, our image is just 1D (the plot we usually see on the graph) and the role of channels play different values — open, high, low, close prices and volume of operations. You can also think about it from other point of view — on any time stamp our time series is represented not with a single value, but with a vector (open, high, low, close prices and volume of every day), but metaphor with images is more useful to understand why we will apply convolutional neural networks to this problem today.

One of the most important moment about multivariate time series — the dimensions can come from different sources, can have different nature and can be totally uncorrelated and have different distribution, so we have to normalize them independently! We will use an ugly, but more or less adequate trick from last post:

We don’t need to predict some exact value, so expected value and variance of the future isn’t very interesting for us — we just need to predict the movement up or down. That’s why we will risk and normalize our 30-days windows only by their mean and variance (z-score normalization), supposing that just during single time window they don’t change much and not touching information from the future.

But we are going to normalize every dimension of time window independently:

for i in range(0, len(data_original), STEP): 
o = openp[i:i+WINDOW]
h = highp[i:i+WINDOW]
l = lowp[i:i+WINDOW]
c = closep[i:i+WINDOW]
v = volumep[i:i+WINDOW]
o = (np.array(o) - np.mean(o)) / np.std(o)
h = (np.array(h) - np.mean(h)) / np.std(h)
l = (np.array(l) - np.mean(l)) / np.std(l)
c = (np.array(c) - np.mean(c)) / np.std(c)
v = (np.array(v) - np.mean(v)) / np.std(v)

But as we want to forecast movement of a price up or down next day, we need to consider the change of a single dimension:

x_i = closep[i:i+WINDOW]
y_i = closep[i+WINDOW+FORECAST]
last_close = x_i[-1]
next_close = y_i
if last_close < next_close:
y_i = [1, 0]
y_i = [0, 1]

So, the data we will train on — are time windows of, like before, 30 days, but now on every day we will consider whole OHLCV data correctly normalized to predict the direction of close price movement. Full code for data preparation and neural network training you can find here.

Neural network architecture

The code for our network for today looks like:

model = Sequential()
model.add(Convolution1D(input_shape = (WINDOW, EMB_SIZE),

The only difference from an architecture from a very first post is changing the EMB_SIZE variable to 5 in our case.

Training process

opt = Nadam(lr=0.002)reduce_lr = ReduceLROnPlateau(monitor='val_acc', factor=0.9, patience=30, min_lr=0.000001, verbose=1)
checkpointer = ModelCheckpoint(filepath="model.hdf5", verbose=1, save_best_only=True)
history =, Y_train,
nb_epoch = 100,
batch_size = 128,
validation_data=(X_test, Y_test),
callbacks=[reduce_lr, checkpointer],

And check performance:

Image for post
Image for post
Loss after 100 epochs
Image for post
Image for post
Accuracy of binary classification after 100 epochs

From the plots we can clearly see that network trained adequately (for very noisy data), the loss of training set was decreasing with time while accuracy — increasing. And, what’s the most important, comparing to univariate time series from previous post we improved the performance from 58% to almost 65% of accuracy!

To check overfitting we can also plot confusion matrix:

from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
pred = model.predict(np.array(X_test))
C = confusion_matrix([np.argmax(y) for y in Y_test], [np.argmax(y) for y in pred])print C / C.astype(np.float).sum(axis=1)

and we will get:

[[ 0.75510204  0.24489796]
[ 0.46938776 0.53061224]]

which shows that we predict “UP” movement with 75% of accuracy and “DOWN” with 53% of accuracy and this results of course can be balanced for the test dataset.

What about regression?

Unfortunately, for returns it still works bad:

Image for post
Image for post
Loss decreasing for regression problem
Image for post
Image for post
Prediction of the price change

For prediction of value of close price the situation isn’t better:

Image for post
Image for post
Prediction of close price

I am still trying different things for regression problem in financial data (like custom loss functions), if you have some suggestions, I’d like to discuss them in comments or PM.


Meanwhile we still can state that regression problem is still too complicated for us and we will work on it later, choosing correct loss metrics and activation functions.

In next post I would like to introduce the concept of multimodal learning and we will use parameters not just from our .csv file with OHLCV tuples, but much more interesting things.

Stay tuned!

Follow me also in Facebook for AI articles that are too short for Medium, Instagram for personal stuff and Linkedin!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store