Financial bars at the age of deep learning

Published in

Analytics Vidhya

9 min readMar 29, 2020

At the age of machine learning and deep learning, the representation of financial data needs new approaches. Candlesticks, for example, originated from Japanese rice merchants and were first used in the 18th century. Obviously, they weren’t developed for neural networks. They are very well suited to visualize the price movement but hide possibly important information. Models process data differently, and they can “understand” features hard to apprehend in bulk by humans.

We recognize patterns in images, neural networks recognize patterns in multidimensional datasets. We can use the same OHLC data to display a bar chart and feed to our network, but we can derive very different additional information from the price movement other than the open, high, low and close price of the sampled bar.

I will share ideas about time-series data transformations for neural networks and will flavor it with some sacrilege at the end, but there are plenty of other options worth examining. Stationarity, homoscedasticity or uncorrelated features are very important for some forecasting methods, but neural networks don’t require such strong conditions. These conditions can improve the performance of some models, while others are built to deal with heteroscedastic data like the mixture density network in my earlier article.

We will use only the ask price. Combining it with bid price, volume or other data could lead to many new features.

This article isn’t the implementation of the ideas in the book of Marcos Lopez de Prado. That is an awesome book, and you can find many articles about the topics in it. For example, this one from Maks Ivanov, which explains time bars, tick bars, volume bars, dollar bars, and the imbalance bars.

We follow a different approach, but we can mix the ideas in this article with the above-mentioned techniques. For simplicity, we will use time bars, but for most examples, any of the above-mentioned alternative bar sampling methods could be used.

For the code, the notebook can be reached here: https://github.com/sinusgamma/bars_with_deep_learning/blob/master/ts_bars.ipynb

Dataset

Our raw data is the USD/JPY tick ask price from 2016. We will examine resampling and feature building techniques, but won’t train a model, so we don’t use a large dataset. The source of the tick data is dukascopy.com.

Good old candlesticks

Why we need bars at all? The main role of bars is to filter noise, compress information, and transform the data into a human or model comprehensible form. This is true to all bar types, not only candlesticks. Noise filtering and information compressing helps us (or doesn’t) to see the price movement, recognize trends, support and resistance levels, volatile regions and candlestick patterns (what is in my opinion more misleading than useful). Unfortunately, because of the filtering and compressing lots of important information vanishes, or we get false impressions. For example, the candlestick tails are much thinner than the body, what visually implies that most of the price action happened in the body’s region, but it isn’t always true, sometimes most of the price action happens in the upper or lower tail, or has multimodal distribution.

Bar tensors

When we sample “bars” for models, the main goal is similar to sampling bars for human visualization. Most of the times we want less noisy data, fewer data to speed up the calculation, and features that we hope will improve our model’s performance. Of course, we can always use raw tick data, and I prefer tick charts to bars when watching shorter price movements. Why use bars at all when training models?

One reason is the required computation power. Depending on the range of the bar and the number of parameters, bar data can be only a fraction of the size of the original tick data.
Models based on LSTM layers have problems with long sequences, and the frequency of tick data is too much for them if the goal forecast timeframe is relatively large.
We can produce features from tick data which can improve model performance.

The number of features of a bar is arbitrary. We should consider the bars as tensors. We will use one-dimensional vectors as bars, but it isn’t difficult to make up multidimensional bars if we want, and our model is capable to use them. Just an example of a 2XN dimension bar: The N represents the number of features we derive from the tick data, and the two layers are the ask and bid data. You need a proper model to use this arrangement of data because the same data could have 1X2N dimension, where the features derived from ask and bid ticks are concatenated along the same axis. But if you use 2XN dimension input, you can use a Conv1D layer over it similar to the Conv2D convolution of an RGB image. But we can expand our features in a very different dimension. We can use the last tick data as an anchor point, and instead of sampling only from one timeframe, we can use derived features or statistics of different timeframes ending at this anchor tick. When using deep learning models, we should consider the layers of the model when building the bars, because different layers can explore different connections.

What kind of new features can we add to our bar tensors? Any kind we can imagine and turns out to be useful. The open, high, low, close (first, max, last, min in the code) can remain among our features, why not, but if you don’t care about the slippage between bars, you can even omit the open price. If you resample daily stock prices you shouldn’t drop it, but the open price of a 1-minute bar maybe not so important. A feature can be the mean of the ticks in the bar range or the volume-weighted mean of the ticks. If we use time-based bars, then the number of ticks during the time-range can be another feature. If we use tick bars, then the time-range of the ticks in the bar can be a feature. We can use the standard deviation of ticks in a bar to compare the volatility of bars and the Spearman’s rank correlation between the price and the time axis to estimate how monotonic the price movement is. Log return can be another feature, and we will calculate the quantiles to estimate the regions where the tick price resides in the bar. We could generate other features. If you have something in mind, you can share it in the comments or on LinkedIn :)

Ok, let’s build our “bars”!

The chart above displays a sequence of our features. Some observations: The body of the candlestick shows a different pattern than the Q1-Q3 range. The log-returns move together with the Spearman’s rank correlation and the highest standard deviations of tick prices are more typical of bars with more ticks. The displayed sequence is short and analyzing these connections more deeply isn’t the goal of the notebook, but we expect(/hope) that the new features will improve the performance of our model.

Refine the features

Among our features, we have log-returns and price information. For neural networks, all of them can be useful and depending on our forecasted features one can be more important than the other. This article explains the advantages of log returns. With our dataset, these advantages can be extended. In the next table, we can see the correlations and standard deviations of our price features. The correlations are almost one, and up to the second decimal, the standard deviations are the same because relative to the long-term price changes the price features in a bar are very close to each other. This means that the model has to find useful patterns among very similar numbers.

Pearson correlation of the original price features

Standard deviation of the original price features

How can we help the model? We can “logreturnise” most of the price features.

We use the ‘mean price’ feature as the base of our in-bar log returns, and calculate all the other ‘price features’ relative to the mean price of the bar, but the other price features could be a plausible base. When using the mean as a base for the calculations the return-like high and Q3 will be always positive, and the low and Q1 negative. The new features have far lower correlations. The standard deviations of the new features are more varied when scaled to the same magnitude as the standard deviation of the simple price features and the aggregated quantile features have considerably lower standard deviations than the one-tick-dependent open, high, low and close related features. These statistics mean that our new features can more easily lead to model-recognizable patterns.

Pearson correlation of the new price features

Standard deviation of the new price features

Data Augmentation

When training deep learning models on time series, we have to face two problems. The dataset can be too short to feed a data-hungry neural network, and training on too old data sometimes can be worse than dropping it.

Augmenting the data can help to increase the number of samples. With tick data, we can sample bars from any tick. For example, instead of using hourly bars only with 17:00–18:00 ranges we can sample bar sequences where the hourly ranges are from (xH:41min) — (x+1H:41min), or even for every closing tick.

In the chart below we can compare a sequence of 5 min bars, and another sequence of 5 min bars shifted by 2 mins. The charts are similar at first, but if we examine them closely we can notice differences, and these differences can be very large if we compare only candles to candles.

Sacrilege!!!

Let’s break some rules, or at least sacrifice some advantage of the bars sampled after equal time periods, certain tick numbers, or pre-defined volumes.

Let’s build a bar sequence, where the bars further in the past cover larger and larger ranges.

But why would we do that? Because we want to enable our model to process a sequence of a very long period. If we use equally sampled bars, then for our forecast the last bar of the past denoted as “t” is the most important, and the earlier bar at “t-1” is probably very important. The bar at “t-46” and “t-47” alone can be relatively unimportant, but some characteristics of the range between “t-40” and “t-50” can be useful for the model. We can sample this full range as one bar.

The range of a full input sequence can be very large this way. In the following calculations, there are some simple examples of how large ranges can be processed with this technique. For example, if our last bar is a one minute bar, and every earlier bar is longer by 1.1, then a 20 bar sequence is almost an hour. Instead of 60 1 minute bars, we can use only 20, not equal length bars. But if our sequence is 100 bar long, then the range of our full sequence is 137796 minutes. This is over 95 days!

These bar types have their strong disadvantages. These bars will distort or even abolish seasonality. If we have seasonal data, probably we should use other sampling methods. These bars need more computation power than the regularly sampled bars because we need to calculate more bars of different ranges.

I have to admit that I didn’t use this bar type earlier. This article is part of a sequence of loosely/closely related articles I wrote and will write, and this idea belongs here. In the code below I show a simple implementation of these bars. The last bar of the sequence is a 5-minute bar, and the 50 bars in the sequence cover 2 days. Live implementation for model training will come later.

Bars with an exponentially growing range

Thanks for reading!

Thanks for reading. If you have any remark, critic or idea you want to share, write in the comments or send a message on Linkedin. https://www.linkedin.com/in/istvanveber/