AI Trend forecasting | Stock prediction
Disclaimer: This article is written for the learning purpose to apply Machine Learning and Prediction model for predicting the future stock price.
The third project | Trend forecasting | DSE3 G.9
Let’s start with AI
What’s AI? AI isn’t just a new set of tools. It’s the new world. AI has the potential to create huge economic and societal benefits for businesses, governments, and individuals worldwide. Source: PwC
As a PwC 2021 AI Predictions survey, you can see that AI Companies have realized the benefit from AI investment such as improving decision-making, increase productivity, and cost-saving. However, we think that not just AI companies can benefit from AI investment but you, as a retail investor, can also use this AI technology to win in the stock market.
Have you invested in the stock market?
- Have you won in the short run but got lost in the long run?
- How many hours per week do you spend studying and trading in the stock market? More than 10 hours per week?
- Are you lazy to study more than 500+ stocks and buy just a few stocks into your portfolio?
- For newbie investors, you don’t know which stock you should buy? Have you asked your friend? Or search from the internet?
“If you face the problems above, read this article. We will show you how to apply AI for improving decision-making, reduce bias and risk and, ultimately, save your time for investing in the stock market!”
Here is our story
After two challenges passed, BOTNOI announced a new weekly challenge. This is our third project in the Data Science Essential program, called the Trend forecasting project. We have spent one week building up this model and writing this article. Even though it’s a tight timeline, it’s worthwhile and enjoyable. Our team has learned from each other to exchange investing, modeling, and programming perspectives.
Our project requirement
- Pick one stock from SET100 based on the potential highest gain from the prediction model
- Select the stock by 4 June 2021
- Buy selected stock on 7 June 2021 and sell on 14 Jun 2021 (at ATO price)
- The budget for investing in the stock is Baht 20,000.
- Allow picking the stock whose price is less than Baht 200.
- Apply Data Science and Machine learning pipeline
- The SET100 index is the primary stock index of Thailand. The constituents of the list are companies listed on the Stock Exchange of Thailand (SET) in Bangkok.
- At-The-Open Order (ATO): An order to buy or sell a stock at the session’s opening price. ATO orders are allowed during pre-open sessions (morning and afternoon).
Our investing strategy
When you play a game, if you want to win, you need a strategy! Invest in the stock is the same. You need an investing strategy! And here is our investing strategy called a hundred to one.
What do you need to know to build up this investment strategy? Technical analysis, Money management, Fundamental analysis, AI model for stock prediction, and Stock selection model.
This article will walk you through this concept before jumping to the AI Prediction Pipeline and results.
I. Technical analysis
Technical analysis is widely used for the technical investor, or some hybrid investors combine technical analysis and fundamental analysis to make decisions when investing in a stock.
Technical analysis is a means of examining and predicting price movements in the financial markets, by using historical price charts and market statistics. It is based on the idea that if a trader can identify previous market patterns, they can form a fairly accurate prediction of future price trajectories.
Source | IG.COM/Technical analysis definition
You might have heard from your parents or friends that stock marketing is like gambling. Do you believe that? How can you create a better chance to win this game in the stock market?
You should know the trend!
What is a Trend? A trend is the overall direction of a market or an asset’s price. In technical analysis, trends are identified by trend lines or price action that highlight when the price is making higher swing highs and higher swing lows for an uptrend, or lower swing lows and lower swing highs for a downtrend.
Source | Investopedia
There are two types of trends: Uptrend and Downtrend.
Why should you buy stock in uptrends?
- Uptrends are characterized by higher peaks over time and imply bullish sentiment among investors.
- A change in trend is fueled by a change in the supply of stocks investors want to buy compared with the supply of available shares in the market.
- Uptrends are often coincidental with positive changes in the factors that surround the security, whether macroeconomic or specifically associated with a company’s business model.
Source | Investopedia
We show you the illustration of uptrend and downtrend. The uptrend is where the overall stock price goes up in a current period of time while the downtrend is where the stock keeps going down. This is an article, we have gone beyond that, our team creates an indicator called 25-day EMA and 100-day EMA for the stock selection model. This will represent short and long-term pricing lines. You might notice that, when the green line is higher than the red line, it might indicate that the stock is on an uptrend. In reality, it is not necessary to use only 25-day and 100-day EMA to compare with. You might use 12-day and 50-day EMA or others which depend on your experience, stock, or certain market.
Remark: The exponential moving average (EMA) is a technical chart indicator that tracks the price of an investment (like a stock or commodity) over time. The EMA is a type of weighted moving average (WMA) that gives more weighting or importance to recent price data.
Some retail investors might ask whether EMA is only one indicator used for technical analysis. The answer is “No”. There are plenty of indicators (e.g. Simple Moving Average, Relative Price Strength, and etc.). You can see this in the picture below.
We will not go through a very detailed calculation of each indicator. However, there are some useful sources that explain indicators in visualization and mathematical description.
Infographic: 12 Types of Technical Indicators Used by Stock Traders
Today, the surge in green investing has been compared to the dot-com boom of the 2000s. Back then, the internet was…
II. Money management
Similar to real situations, your resources or budget is always limited. You need to optimize return giving budget constraint
Consideration of Price change vs Absolute return:
There are three constraints which are (1) The budget is Baht 20,000, (2) We can buy only 1 stock in time, and (3) we can buy only multiply of 100 units in each stock as this base on the Thailand Stock Market requirement.
For example, if the stock price is Baht 150, we can only buy 100 units but not 133 units. Therefore, we will have Baht 5,000 remaining in cash and buy an amount of Baht 15,000. In order to ease understanding, we have shown the different results between a price change and absolution as in the following picture.
You can see that Stock A’s price has increased by 33% which is more than stock B at 30%. However, investing in Stock B provides the greater absolute results (value change of investing in Stock B is 30% while in Stock A is 25%.
Remark | %Price change is calculated by dividing target stock price with purchased stock price while %Value change is calculated by dividing total value with initial investment.
Do you know what the reason is? The answer is “money management”. Investing in Stock B can utilize 100% of the cash. Given other factors are constraints, investing in Stock B is a wiser choice. Therefore, our investment strategy will focus on the absolute return of investment, instead of price change.
III. Fundamental analysis
“Market action discounts everything’ says Technical Analysis but sometimes price charts lie.”
Source | technicalanalysts.com
Sometimes looking at only indicators or graphs might lead to a significant loss. That is why we look at key fundamental factors together such as Return on Equity (ROE), Net Profit Margin (NIM) and Dividend Payout (DIV); and recent NEWS. This might improve the chance to win. We have summarized the benefit of these factors as follows.
- NIM indicates how much net income a company makes with total sales achieved.
- ROE indicates management’s ability to generate income from the equity available.
- DIV indicates healthy and appropriate from a dividend investor’s point of view.
The NEWS has a very important factor to indicate stock price movement as investors come with emotion. If there are good NEWS or stories, it will help stock prices go up. So, NEWS is the final factor that our team use for the stock selection model
IV. AI Model for stock prediction
This project is related to stock prediction, as it needs to forecast the stock price. Do you know what is characteristic of stock price? Yes, it’s a Time-Series.
What’s Time series forecasting? It uses information regarding historical values and associated patterns to predict future activity. Most often, this relates to trend analysis, cyclical fluctuation analysis, and issues of seasonality. As with all forecasting methods, success is not guaranteed.
What is XGBoost?
XGBoost is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework. In prediction problems involving unstructured data (images, text, etc.) artificial neural networks tend to outperform all other algorithms or frameworks. Our team has summarized the XGBoost summary in the infographic below.
“The most important factor behind the success of XGBoost is its scalability in all scenarios. The system runs more than ten times faster than existing popular solutions on a single machine and scales to billions of examples in distributed or memory-limited settings.”
XGBoost is designed for classification and regression on tabular datasets, although it can be used for time series forecasting. This link below will show you how we can apply XGboost for time series forecasting.
How to Use XGBoost for Time Series Forecasting - Machine Learning Mastery
XGBoost is an efficient implementation of gradient boosting for classification and regression problems. It is both fast…
Limitation of XGBoost
This XGBoost model is that tree-based algorithms so it cannot extrapolate data. For the algorithm, train set bounds are total bounds for the rest of the data. So, it does not fit the Stock market that is reaching a new high or new low as the prediction will not go beyond such train set bounds.
Our team looks at the SET100 chart, we can see that SET100 index does not reach a new high or low, so we expect that XGBoost is still fine with stocks in SET100.
V. Stock selection model
Some retail investors might ask “could they only rely on the result from the AI prediction model to successfully win in the Stock Market?” The answer is “probably not”.
Since we have 100 stocks as our population, we perform the pre-screening criteria
- Trend: We buy only when the stock trend is up. One indicator is that 25-days EMA is higher than 100-days EMA. This will indicate an increasing trend in stock price. So, it’s a better chance to win this game.
- Probability: We calculate the accuracy ratio based on our back-testing. As this project challenge is only one chance to invest in one stock, we would like a more precise result. So, we choose only stocks that have the probability to win more than 50% based on our back-testing result.
Based on the result of the AI Prediction model and pre-screening results, our team selects the 12 stocks with the highest target profit to perform stock selection as shown in the following illustration.
Our team has to brainstorm and come up with the Stock selection model as illustrated above. We combine three main components which are (1) profit result from the Prediction model, (2) result from Back-Testing and Fundamental analysis for our stock screening.
Of course, the result from the prediction model, which is profit, is the key factor for this stock selection model. As refer to cash management consider, targeted profit is based on profit amount based on the initial investment (i.e. Baht 20,000) rather than a percentage of increase in each stock itself. We put the highest weight at 40% for calculating the scoring.
The result of back-testing of the model is also the second factor for our stock selection as we consider that the higher chance that model can predict correctly, the higher the confidence level that we can trust in the AI Prediction model. We allocate weight at 30% for this factor.
We use the growth of return on equity (%ROE change), dividend payout (%Div change), and net profit margin (%NPM change) to represent fundamental analysis. As these three representatives are fundamental factors, so we allocate weight to only 10% each which gross up to 30% for the fundamental analysis factors.
As we would like to create a systematic way for scoring. We, then, separate three-tier (e.g. top 4 will get 1 score while the bottom 4 will get 3 scores). Then, we use these scores calculated in each factor to calculate the final score. We will select the stocks with the top 3 lowest scores for the final round selection.
Final selection by NEWS screening
We select the top 3 of the lowest scores. We do research for stories to support the stock increases based on NEWS and IAA Analyst consensus. Finally, we will pick only one stock based on the best stories at the time together with the final score.
Machine Learning Pipeline
Step 1 | Get data
- We listed the stock listed in SET100.
ราคาหลักทรัพย์ - ดัชนี SET100
The Stock Exchange of Thailand: Your Investment Resource for Thailand's Capital Market
- We extracted stock information through Yahoo Finance’s API by using the “pandas_datareader” library. Note that we did not specify a period for an extraction. Therefore, we received stock information starting from 2 June 2016 until 2 June 2021 (the date of extraction). We understand that extracted data is limited to 5 years.
- Information gets from each stock (trade date, high, low, close, adjust close, volume, and symbol) as below table. Note that adjust close is the price adjusted dividend, par split, and others. Our team will use this adjusted close price to perform stock prediction and benchmark for investment return. Below are examples of extracted stock information.
Step 2 | Cleansing & transformation
- We scanned through stock information and found that the dataset is of good quality (e.g. no missing data). Therefore, we did not perform data cleansing to this dataset.
- We remove 2 stocks which are SCGP and OR because these two stocks have been listed in the SET market in October 2020 and February 2021, respectively. So, we consider that the data is limited to train.
Step 3 | Feature engineer
We define the base features which are Open, Adj Close, High, Low, and Volume.
We create technical indicator as additional features as below:
- Average volume (20)
- %Return (from 1 to 10 days)
- SMA (10), SMA (25), SMA (100), and SMA (200)
- EMA (10), EMA (25), EMA (100), and EMA (200)
- MACD (12,26)
- STOCH (9,3,3)
- BB (10)
- (x) represents the number of days. For example, SMA(10) is calculated from an average of 10 days ago.
- Technical indicators, except for average volume, is a price-based indicator which we use adjust closing price as a feature used for calculation.
The detail calculation of each technical indicator, you can refer to the link below.
Mathematical Description | Technical Indicators | Stock Charts
This document contains mathematical description of all technical indicators available in AnyChart Stock Component. For…
- We generate technical indicators using the code below. Note that we show you only the codes for calculating %return, SMA, and EMA as examples. The further detail is in the provided Colab link.
- We use the base features and technical indicators to be input for the Time Series Prediction model.
- At this stage, we will use the only dataset that starting date from 1 January 2018 to 2 June 2021 as total datasets. The datasets before 1 January 2018 are filtered out. The older datasets of time-series might not be relevant to predict stock prices for the recent time. However, we need the long history of the dataset to create technical indicators such as EMA(200), which need the prior 200 days to create the indicator.
Step 4 | Prediction model
We use XGBoost for Time Series Prediction and perform the following steps.
I. Define training and testing datasets
For purposing of developing the model, we split the dataset into a training dataset (1 Jun 2018 to 21 Apr 2021) and a testing dataset (5 May 2021 to 2 Jun 2021). We use the training dataset to predict the price of the next 2 and 8 days.
Remark | at the end we need to predict the price at 14 June 2021 so it is the next working 8 days from the latest dataset, which is 2 June 2021.
II. Setting hyperparameters
We set Hyperparameters for XGBoost which are ETA=0.3, Subsample=0.7, N_Estimator =150 , and min_child_weight = 5. This setting of hyperparameters is based on our test of model performance.
For further detail of XGboost: XGboost
III. Evaluating model performance
The accuracy ratio is calculated by the sign of movement in stock price between a predicted sign and an actual sign. For example, if we predict that stock price goes up and the actual price goes down, so this means we incorrectly predict. The accuracy of training sets is 0.9 while the testing set is 0.7. Although there is a bit gap in this accuracy or our model is a bit overfitted, we do accept this model as we consider that the accuracy at 0.7 is still high from our point of view.
Based on the illustrative of model accuracy above, we can see that majority of the dates in testing dataset stocks are accurately predicted.
IV. Deployment model
When deploying the model, we predict the targeted stock price on 7 and 14 June 2021 which are buy and sold dates, respectively. The difference between these two days is the price change of stocks. Note that we use the dataset from 1 January 2018 to 2 June 2021 to retrain our model.
V. Adjust tick sizes
As in the SET market, there is some tick size which is the minimum price movement per each market price level. We adjust this price interval before calculating the predicted stock price
VI. Consideration of money management for profit calculation
As we mentioned on money management at the beginning of this article, we calculated the predicted profit amount instead of the price change of stock so that we can see the absolute return of investment in regards to the initial budget at THB 20,000.
Step 5 | Result and visualization
According to our Stock Selection model (described in the first part), we have summarized 12 stocks with the highest target profit. We rank each stock by each factor (i.e. profit, %win/loss, %ROE change, % Div change, and %NPM change). In each factor, we rank by scores 1, 2, and 3 which 1 mean the best and 3 mean the worst. Then, we calculate the final score. The top 3 stocks that get the lowest score will go to the final round of stock selection are RS, STA, and ACE.
Remark: the profit shown in the above table is taken into consideration by cash management. In addition, based on the list of 12 stocks, the stock prices are quite far from Baht 200. So, we consider that it is a very rare chance that these stocks will reach Baht 200.
Based on the final round of stock selection, RS shows the best in class for every aspect as follows.
- RS has the best score (Rank no.1) when we do stock selection in the previous round
- Based on IAA Consensus from analysts, RS has slightly better than STA as most analysts recommend buying RS and none of the analysts recommends to sold RS
- Based on NEWS search, we found that RS has good stories to support an increase in stock price (e.g. continue to expand a business, potential growth from new product development ability and marketing skill)
- ACE has the worst supporting stories from both analyst results and NEWS when comparing to RS and STA.
Finally, our team picks RS Public Company Limited (RS) as the best stock to buy in on 7 June and sells it on 14 June at ATO price.
AI comes into our life. It might help you to create a better world. You might have more time to do what you like. However, you should know how to use them in the proper ways. As the AI model has been developed every day, whichever model currently works. It might not work in the future.
As retail investors, we don’t recommend just relying on the result of the AI prediction model to invest in stock but making further studies to understand the fundamentals of stocks. If the fundamentals of the stock are good, there will be less chance for you to lose in this game.
We think that there are opportunities for further enhancement such as trying alternative AI models, deploying AI models to a chatbot for providing daily price prediction, or adjusting models more for longer-term prediction.
We hope you enjoy reading this article. This might provide you some ideas on how to apply AI in real work (e.g. stock prediction), and how to combine investment strategy with an AI prediction model.
Below is the Colab link for further studies.