Implementation Journal : Let Machine Read Candlestick Charts Like Human Beings

Hallblazzar
Hallblazzar :Developer Journal
13 min readApr 20, 2019

This article is about the developing progress of my master thesis, Let Machine Read Candlestick Chart Like Human Beings, and used to remember those amazing and crazy days during studying for my master’s degree.

There are 2 different version of my master thesis, which are based on different deep learning techniques and prediction targets separately. However, there exist more unwritten methods came up during I found best solution for the problem no matter by me or by progress while discussing with my advisors. Most of those methods were derelict because of defective of architecture or performance problem. Record details of progress about how those methods came up and why did they eliminated is the main purpose of this article.

What’s the problem I want to solve?

As title of my thesis, ‘Let Machine Read Candlestick Charts Like Human Beings’, what I did was to make computer ‘read’ candlestick charts to forecast price movement. Instead of directly using numerical formed price data for forecasting, human traders can perform better through read chart formed data such as price charts. Candlestick chart is one of popular choice for traders for very long time. My work is to reproduce such procedure through machine learning techniques.

The problem can be formulated as follows: given a k-days candlestick chart, I want computer can analyze it and forecast trend of price of (k+1)-th trading day. In addition, I simply classify the ‘trend’ into 2 classes, up or down. If close price of (k+1)-th trading day is greater than the close price of k-th trading day, the trend is regarded as ‘up’, on the contrary, the trend is down. Definition of trend is shown in Formula 1 where ‘C’ indicates close price.

Formula1. Definition of trend

What is candlestick chart?

As Figure 1 shows, a candlestick chart consists of different ‘sticks’. Each stick represents price movement (up or down) of specific interval such as 5 second, 5 minutes, 1 days, 20 days, and so on. As Figure 2 shows, a stick consists of shadows (lines) and body (blocks with different color) which lengths are decided by 4 different price within the interval, open, high, low, and close price. If close price higher than open price, direction of price of interval is up, on the contrary, direction is down. A candlestick chart consists of different number of sticks and human traders will read it to help them forecast trend of price in the future.

Figure 1. Example of candlestick chart
Figure 2. Scheme of candlestick chart

Sounds make sense…does the approach really perform better than directly using numerical data?

Actually, experiment results show that forecasting through such procedure perform better than directly using numerical data. However, it took me lots of time to construct different machine learning models and adjusted hyper-parameters of them. On the contrary, approaches use numerical data are easier to construct, debug, consuming fewer computational resources and having lower training time. In addition, though forecasting through charts can perform better, the improvement of accuracy is just 2–3% better.

If anyone asked me which approach would I chose if I wanted to construct auto-trading system, I would still choose the approach which directly using numerical data. One of my advisor also told me, “If I’m working in industry, I would not use such time consuming and complex approach. I don’t want to waste my life!”. But he also said, “But Hallblazzaer, you’re doing research now.”(with foxy smile).

Data

Data I used among all experiments and baselines was historical price data of 6 future merchandises which were sold by Taiwan Futures Exchange (TAIFEX). The reasons why I chose them could be checked in my thesis [1]. Details of data are listed in Table 1. Note that among all experiments and baselines, prediction target were set to be TX, because it has highest trading volume, which meant the auto-trading system based on TX didn’t have to care about being useless in real world. However, on the aspect of doing research, generalization of algorithm is one of important part. So actually my experiments didn’t cover all aspects, which should be improved in my future works.

Table 1. Basic information of 6 future merchandises

Baseline: IEM

As each fairy tail has bad guys need to be beat or difficulty to overcome, each paper has a baseline to defeat, no matter reasonable or not. In my case, my opponent was the prediction model based on forecasting through numerical price data. I implemented model of [2], called index based model (IEM), which used traditional indices for forecasting. There exist great number of indices to help people making trading decisions such as simple moving average (SMA) [3], commodity channel index [4]. Each index are calculated from numerical formulas which are based on anthropic knowledge and experience. My approach is base on human behavior but knowledge and experience need to be learned by computer itself, so IEM is good baseline for it.

Figure 3 demonstrate architecture of IEM, where price histories are converted to 10 different index value based on 10 traditional indices, respectively. Based on definition of each index, each index value can directly represents trend of price. In [2], Trend Deterministic Layer is presented, which convert each index value to trend (only up/down, binary classified) directly before further analyzing. Then these 10 trend are combined by classifier for forecasting the price movement. In [2], 4 different classifier are implemented and used to compare performances, including Neural Network, SVM, Naive Bayes and Random Forest. In my thesis, I implemented all of them for comparison, and the one had best performance (highest accuracy) was Naive Bayes, which accuracy could reached 77.52%.

Figure 3. Architecture of IEM

However, is IEM the best baseline for me? I’m still wondering whether my direction is right or not because I think the approaches which using anthropic knowledge and experience to analyze candlestick charts seems are better choice for my thesis. But guess what? Actually IEM is one of my side work before I decided to do research of using artificial intelligence approaches to analyze candlestick charts. Maybe used different approach to be baseline could allow me work more easily, but my advisors liked to break difficulties. So…you know… .

Just kill me…

VGG16-CAE(Convolutional Auto-Encoder)-Based Model

This is first model I constructed, a simple, intuitively to understand deep learning model. At same time, it had miserably performance which strongly slumped my confidence on doing research and made me bombarded by my advisors with great number of doubts such as “How could you do that?”, “Why did you expect it would work?”.

The architecture of model was as Figure 4 shows. It consisted of 2 parts, a pre-trained VGG16 network for feature selection, and fully-connected layers(FC) for classification. How I trained the model is as Figure 5 shows which was inspired by [5]. The main idea of [5] is to train VGG16-CAE based model to convert candlestick charts to deep features, using deep features for clustering and analyzing clustering results. I thought if the model of [1] was good enough for clustering, why not used it for classification?

Figure 4. Architecture of VGG16-CAE-based model
Figure 5. Training progress of VGG16-CAE-based model

The result was not as I expected. Auto-Encoder in step 1 performed very well. Given arbitrary candlestick chart, Auto-Encoder could reconstruct candlestick chart which was very similar to original input from deep features. However, in step 2, no matter gave any candlestick charts to model, it would predict them to same labels.

I tried changing number of FC layers, unfreezing more layers on Encoder and using different hyper-parameters combinations such as loss function, learning rate, and so on, but none of approaches above could further improve the performance, anything I did could just made my advisors’ face looked more awful.

Pattern Effect Combining Predictor(PECP)

So, what happened to VGG16-CAE-Based model? My advisors and I thought the problem might be the input, i.e., candlestick chart itself. Input for model were 20-days candlestick charts as Figure 2 shows but were in grayscale. However, human traders won’t directly forecast trend of price from such long interval charts. Instead, they will consider smaller regions. It means the model didn’t act as real human beings.

As Figure 6 shows, different short interval candlestick charts imply different effects to the final trend, which are also called ‘patterns’. Human traders will combine effects of each pattern and decide final trend of price. Note that in Figure 6, only 3-sticks patterns are considered. However, in general, people will consider more different patterns such as 1-stick, 2-sticks, 5-sticks, and so on, and take effects of them in consideration. In my experiments, no matter Pattern Effect Combining Predictor or its successors, I fixed the interval of long interval charts to 20 (20 days charts) and interval of shorter interval charts to 3 (3 days charts).

Figure 6. Progress of judging trend of price from combining effects of short interval patterns

Based on theory, I designed Pattern Effect Combining Predictor (PECP). Architecture of PECP is as Figure 7 shows, which contains 18 ResNet-18 to generate temporary effect and FC Layers to decide final effects. My original idea was as follows: each 3-days pattern should have effect to 21th day’s trend, if I could predict them and combine them by order of time, I could decide 21th day’s trend precisely. How I trained the model is as Figure 8 shows, where training progress can be split to 2 stages. First stage was training 18 ResNet-18, and each ResNet-18 was used to judge effect to trend of 21-th day from each 3-days pattern(there were 18 3-days candlestick charts within 20 trading days). Second stage was training FC Layers to combine results by order of time to decide final trend.

Figure 7. Architecture of PECP
Figure 8. Training progress of PECP

However, results stunned me as VGG16-CAE-Based Model did because I got abysmal failure on Step 1. The results of Step 1 was as results of VGG16-CAE-Based Model, no matter gave any 3 days candlestick charts to ResNet-18 models, each model would predict them to same label, not spoke of using these results to train FC Layers in step 2. PECP was also failed.

I thought there were 2 main issues while I designed PECP. One was I used too deep network (ResNet-18) for analyzing candlestick charts, which were only 48x48 pixels. It made me being unable to quickly debug and adjust hyper-parameters of model carefully and often because training the models would take large amount time. Another one was, in my original idea, PECP should be stacking and ensemble method. Based on my advisors’ explanation, my design was wrong because in fact 18 ResNet-18 could be reduced to single ResNet-18 as Figure 9 shows. Beneficial of ensemble model was gone. As a result, I re-designed PECP.

Figure 9. Reduced PECP

Pattern Effect Combining Predictor V2 (PECP-V2)

Re-designed PECP (PECP-V2) was as Figure 10 shows, which contained 18 shallow 2D CNN (3–5 layers) for analyzing candlestick charts and 1D CNN for predicting trend from concatenated deep features which generated from 2-D CNN. PECP-V2 was a end-to-end model, which meant 2D CNN and 1D CNN didn’t need to be trained separately. It reduce complexity of training progress. Note that in PECP-V2, ensemble issue didn’t considered because I just wanted to verify whether the idea of “combine shorter interval patterns for forecasting” was probable or not.

Figure 10. Architecture of PECP-V2

Each input 3-days candlestick chart was converted to 1D vector, which was called ‘deep represented candlestick chart’. These deep representations were concatenated to larger 1-D vector for further analyzing for 1D CNN. Use of 1D CNN was based on [6][7][8], which used CNN to process time series forecasting problem. Under such scenario, what 1D CNN do is not analyze spacial but temporal relations of input, which allow 1D CNN to handle time series forecasting problem. My post[9] also simply introduce how it worked. But my Implementation was wrong that no until I finished and trained model with different hyper-parameters did I found it.

According to [6][7][8], input should be the time series as Figure 11 shows, which deep represented 18 3-days candlestick charts were concatenated along with vector by order of time. However, in my Implementation, deep representations were concatenated as Figure 12 shows, which as concatenated along with channel by order of time. Though it still concatenated by order of time, whether such approach was validated couldn’t been ensured. I also fix this in latter versions.

Figure 11. Input for 1D CNN based on theory
Figure 12. Input for 1D CNN of PECP-V2

Though results was still worse — accuracy was as worse as VGG16-CAE-Based Model and PECP, PECP-V2 didn’t predict all input candlestick charts to same trend. It seemed the model could read patterns based on some reasons, but design of model limited that. As a result, I further improved PECP-V2.

1D CNN Based Model (Deep Candlestick Predictor, DCP)

Next version of PECP-V2 was 1D CNN Based Model. Based on my advisors’ suggestion, in my thesis, it was named Deep Candlestick Predictor, DCP, which architecture is represented in Figure 13. DCP consisted of 3 components, Chart Decomposer for splitting original long interval charts (20 days) to short interval ones (3 days), Autoencoder for converting input charts to deep representations, and 1D CNN to for analyzing and forecasting.

Figure 13. Architecture of DCP

In fact, what Chart Decomposer and 1D CNN did were as PECP-V2 did on pre-processing and analyzing for forecasting, respectively. The difference was Autoencoder and concatenated deep features. As I planed to do after constructed PECP, I replaced dimension-reducing function of multiple 2D CNN(and ResNet-18) by single 2D CNN. But the problem was how to generate lower-dimensional deep features that were ‘good enough’ to represents original charts. As I knew, Autoencoder was one of approach to reach the goal.

Figure 14. Training progress of Autoencoder

As Figure 14 shows, training Autoencoder could be split to 2 stages. Goal of first stage was to find an Autoencoder can encode original input to deep features, and decode them back to original input, where the more the output of decoder be similar to input was better. One important property of Autoencoder is the performance of Encoder (deep features are good enough to stand for original input) is based on similarity between input and output of encoder. However, the truth might be Encoder is worse, but decoder is good enough to reconstruct it back. The only way to check it is to directly use deep features for training. In my case, it was to use 1D CNN to analyze deep features for prediction, which was the second stage.

Results showed that after concatenating deep features as form of Figure 11 shows and using 1D CNN for forecasting, the accuracy could reach 69.11%, which was still worse than performance of baseline (77.52%). As a result, I tried transfer-learning to further improve performance of DCP.

DCP x Transfer Learning

Transfer learning is widely used approach for training deep learning models. Under most situations, necessary number of data for training cannot be collected, which will cause model couldn’t reach its original performance. Transfer learning is to train model on (small) target dataset based on weights which are trained on large and highly related dataset. In my previous experiments, transfer learning technique was also applied on training VGG16 and ResNet-18 based models, which were based on pre-trained weights released by authors of models.

Except training IEM, while training each deep learning models, I directly used all historical price data of 6 future merchandises. But as I described before, my prediction target was TX. Using all data for training was just to temporarily prevent me from problem of insufficient data while searching available models. I still had to make model focus on TX merchandise. As a result, when I found 1D CNN had high potential to be the best model, I transferred weights trained on all merchandises to train TX. Results corresponded to my expect — 80.96%. Based on the success, my advisors and I published paper in IDAA 2018 [1], we even flied to Yokohama to present on conference. However, things were as not easy as I thought. There were some critical problem in my experiments which caused me completely rewrote all codes and my thesis. I’m going to discuss them in next installment.

Reference

[1] [Guo. et al. (2018), Deep Candlestick Predictor : A Framework Toward Forecasting the Price Movement from Candlestick Charts]
[2] Jigar Patel et al. (2015), Predicting stock and stock price index movement using Trend Deterministic Data Preparation and machine learning techniques
[3]Simple Moving Average
[4]Commodity Channel Index, CCI
[5] Guosheng Hu et al. (2017), Deep Stock Representation Learning: From Candlestick Charts to Investment Decisions
[6] Denny Britz (2015), Understanding Convolutional Neural Networks for NLP
[7] Patty Ryan (2017), Stock Market Predictions with Nature Language Deep Learning
[8] Tal Perry (2017), Convolutional Methods for Text
[9] Hallblazzaer (2018), 學習手記:CNN based NLP Application

--

--

Hallblazzar
Hallblazzar :Developer Journal

興趣使然的開發者,專長於網路、軟體/系統架構及DevOps,目前努力進入Data Science的世界。用生命享受徜徉於程式碼與架構之美的樂趣,夢想即使80歲還能繼續熱血玩程式。Github: https://github.com/HallBlazzar Mail: hallblazzar@gmail.com