Using SVM on top of technical indicators to predict Reliance Stock Prices
Predicting stock prices has been an interesting task, isn’t it? Data Science for trading has been on the rise and picking up really quick. Since the past 3–4 weeks, I have been learning about finance and trying to extend my Data Science skills to finance to try and understand how trading is being automated and made profitable by a lot of these algorithmic trading firms.
The other day I came across how Actor-Critic Reinforcement Learning can be used and wrote an article about the same, last night I thought of doing something more simpler and see how it works, since simpler methods have known to work fairly well for most people , I realised I had explored sufficient things to get my hands dirty and try something to create an algorithm which can hopefully generate some profit. Yes, we all know RL is good and amazing and whatnot, but simple methods and technical indicators in Finance have worked fairly well for people in the trading for a long time, so why not couple it with Machine Learning to make something useful and hopefully, profitable.
This is fairly easy to follow-through and can generate decent results, however, please read below disclaimer before continuing with this article.
Disclaimer — Whatever I write here and all the code that you see on my GitHub repo is from my own study and research. I am not responsible for any loss you might incur using this code to perform actual trades. I highly recommend you NOT to use this for actual trading. The purpose of this is to share whatever I have learnt over the past few weeks.
What are these technical indicators?
According to Investopedia,
Technical indicators are heuristic or mathematical calculations based on the price, volume, or open interest of a security or contract used by traders who follow technical analysis.
Long story short, these are a bunch of mathematical and statistical techniques used by Traders which help them in deciding their trading strategy. In this article, we will go through the most fundamental and basic ones and see that they can generate good features which we can feed to our SVM classifier to predict whether we should buy or sell.
Yes, you heard that right, I have tried to think of it as a Binary Classification problem rather than a Regression problem to predict actual stock prices to keep things simple and SVM tends to work fairly well for classification scenarios such as this, Regression/Forecasting of prices can be better achieved using Time Series Models such as an LSTM network.
We use 1 to denote Buy if the price is predicted to rise, and 0 as sell, if the price is predicted to drop.
We will be using TA-Lib and Pandas to calculate most of these technical indicators. So, let’s get started. All the code can be found on my GitHub repository.
Technical Indicators used as Features :
- Simple Moving Average
- Daily Financial data can be noisy. We can use the moving average to get more signal rather than noise from the data. The basic concept here is to provide a window of a set time period and then use that to calculate your aggregate statistic(mean in this case)
- For our case, we have considered a simple 5 day-moving average and a 10-day moving average as our features.
2. Moving Standard Deviation
- This is exactly similar concept to Moving Average, the aggregate statistic, in the end, is Standard Deviation instead of the mean, rest everything else is completely the same. Again, we have considered both a 5 day and a 10 day window to compute Moving Standard Deviation. One very common application of it being Bollinger Bands, which are used to identify the volatility of the stocks and are yet another technical indicator to explore.
3. Exponentially Weighted Moving Average
- EWMA assigns more weight to more recent values in the time series and is a much better option than Simple Moving Averages, due to the following facts —
(i) Smaller window sizes in SMA lead to more noise, rather than signal.
(ii) SMA will always lag by the size of the window.
(iii) SMA will never reach the full peak of the data due to the averaging.
(iv) Extreme values/outliers can skew the SMA significantly.
EWMA gets rid of some of this weaknesses of SMA and is a better choice sometimes.
3. Relative Strength Index
- RSI is an oscillator indicator, with a range between 0 to 100.
- The mathematical equation to calculate RSI is given as -
- Generally, we use a 14 day period to calculate RSI, although this can vary as per any specific requirements.
- A lower period will generate a higher frequency of buying/selling signals and few of them can be wrong.
- A higher period count will generate a lower frequency of buying/selling signals with a high probability of excluding some of the genuine buy/sell opportunities.
- Generally, an RSI value over 70 is considered Overbought, while a valie below 30 is considered Oversold.
- The RSI thresholds maybe 80 and 20 for the above cases in case of Bullish and Bearish markets respectively(Google Candlesticks charts to know more about them).
4. Williams %R
- It is a Momentum Indicator as well as an Oscillator, between -100 and 0
- The mathematical equation, which is quite self-explanatory to compute Williams R is given as -
- A reading above -20 is usually considered overbought.
- A reading below -80 is usually considered oversold.
- Again, above values might slightly differ for Bullish and Bearish markets.
- In a nutshell, it tells a trader where the current price is relative to the highest high over the last 14 periods(or whatever chosen period).
5. Parabolic SAR
- The Parabolic Stop and Reverse indicator is used to identify the trend direction of a particular asset.
- Represented by dots, either below or above the asset’s price, depending on the direction in which price is moving.
- If there is an uptrend, then these dots are below the candlesticks, if there is a downtrend, these are above the candlestick chart.
6. ADX(Average Directional Index)
- This particular indicator is used to determine the strength of a particular trend.
- I will skip with the calculation part which can be read and understood very easily here, just to keep this article short and come to the point.
Great, we now know about a bunch of technical indicators, which we can use to analyze our stock data of Reliance.
Why use SVM?
- SVM is one of the most popular Machine Learning algorithms, extremely popular in the late ’90s and a pretty simple and yet powerful(when paired with the appropriate kernel) approach for tackling Classification and Regression problems.
- SVMs can work very well if we can find the right kernel and if we can find a kernel designed for a specific domain, it can outperform most other techniques and give phenomenal results. However, just to keep things simple, we will stick with the basic ones — a linear kernel, a polynomial kernel and an RBF kernel which comes right out of the box in sklearn.
- The hyperplane is affected by only the support vectors thus outliers have less impact.
- For beginners who are new and want to know more about how SVMs work, you can read this awesome blog post and you will know the ins and outs of SVM in no time.
Drawbacks of using SVM?
- SVMs take a very long time to train and are computationally very inefficient to run on large datasets.
- The time complexity of a linear SVM is roughly O(n²).
- We can use feature scaling however to train them faster and to converge the loss function really faster, therefore feature scaling is an important part when dealing with SVMs.
- The run-time complexity is, however, O(k*d), dependent on the number of support vectors k, and d, the dimensionality of our data.
- Finding the appropriate kernel might be a real pain and is not an easy task.
- Hyperparameter-tuning in case of SVM’s might not be that simplified.
- We can see in the notebook that we achieved the following results on Reliance stock data from 2009 to the current date :
Linear SVM — Accuracy : 51.43%
Polynomial-kernel based SVM — Accuracy: 53.15%
RBF-Kernel based SVM — Accuracy:52%
- A random model had an accuracy of 33% and we can see that we are doing significantly better than a random model, in all 3 cases.
- You might be thinking this is not a great result, agreed to the fullest, but this is something that we have just obtained using merely 5–6 technical indicators and this can be probably made a lot better to develop a fully-fledged trading algorithm, which can do much better. Below are some ideas which I thought of to improve this and make it much better. But we should really appreciate that we have just used some of the most basic indicators and through some very short python code, we are able to achieve a decent result.
How can we make this better?
- Use Sentiment Analysis and news on Reliance Stock/Stocks in your portfolio.
- Use more number of technical indicators(since SVM works well with high-dimensional data, you can easily add more features without worrying much about the curse of dimensionality).
- Apply the same on a diversified portfolio of stocks rather than a single stock.
- Find a domain-specific kernel, which can give the performance a significant boost, i.e., a kernel which works well for finance/time-series data, this needs some research.
- Fine-tune the existing SVM Architecture (use RandomSearch rather than GridSearch to make life easier)
- Interpret your model with model agnostic techniques (such as LIME)to understand any bias(if it exists) and understand which feature is contributing more to your prediction.
- Use more data(even more than 10 years of data, though be careful not use use too much because using too much data adds more noise rather than signal) to obtain much better results, which of course comes at the cost of more time required to train the SVM.
- Try other classifiers, such as KNN, which can be quite similar to RBF-based SVM to see if there is any improvement in the performance.
All the code associated with this blog post can be found on this GitHub repository.
Thanks for reading! Have a great day ahead!