When a friend of yours uploads your new beach-body photo on Facebook and the platform suggests to tag your face, it is not because Mark Zuckerberg is secretly stalking you and knows your name. In fact, it’s thanks to machine learning. Today, machine learning is everywhere around us. Machine learning is when you say “Ok Google, good night” and Google Home turns off your lights and TV. Machine learning is when you search “Fried Chicken Recipe” online and are later shown an ad for KFC on Youtube. But machine learning is not limited only to the tech gadgets we use.
In recent years, it has become a mainstay within the financial industry and particularly in the stock market. This brought us one main positive — instead of the noisy alpha-male yellow-tie traders, we now have computers that handle their job in the exact same way. Except that they are better, faster, cheaper, more reliable and won’t retire with a fat check at the age of 35.
What is Machine Learning?
Machine learning is the ability of computers to learn new things autonomously. The learning process is based on data, past experience, and observations. The more data the computer processes, the better it becomes in the conclusions it makes. And this is exactly why machine learning algorithms have become an integral part of the financial markets’ DNA.
How Stock Investing Benefits from Advances in Machine Learning?
The trading process has evolved massively, to a state where traders employ sophisticated parameters and combinations of factors to come up with a decision. From social sentiment scores, through technical indicators, to fundamental information — investing today is more complicated than ever. Machine learning has the potential to ease the whole process by analyzing large chunks of data, spotting significant patterns and generating a single output that navigates traders towards a particular decision based on predicted asset prices.
How Does it Work in Practice?
In their core, financial markets tend to be unpredictable and even illogical, just like the outcome of the Brexit vote or the last US elections. Due to these characteristics, financial data should be deemed to possess a rather chaotic structure which often makes it hard to find sustainable patterns. In order to solve this, the algorithm should be fed with as much unbiased information as possible.
Modeling chaotic structures requires machine learning algorithms capable of finding hidden laws within the data structure and predict how they will affect it in the future. The most efficient methodology to achieve this is “Deep Learning”. Deep learning can deal with complex structures easily and extract relationships that further increase the accuracy of the generated results. Here’s a guide to building deep learning models to help you get a better understanding.
The way machine learning in stock trading works does not differ much from the approach human analysts usually employ. The first step is to organize the data set for the preferred instrument. It is then divided into two main groups — a training set and a test set. Why is that? Before the algorithm is tested, it needs to be trained and fine-tuned which is what the training set serves for. After it becomes clear that the algorithm fits all requirements, it is then put into action with the test set. After the algorithm generates a result, it is then compared to the real-life performance of the particular stock.
An Example of the Logic Behind a Machine Learning Algorithm for Stock Trading
There are plenty of ways to build a predictive algorithm. However, most of them usually follow the logic presented below as it is an easy and efficient way for basic stock market predictions:
- Gather the needed data
As we have already mentioned, financial markets are chaotic structures. And chaotic processes have proved that past events can have a massive influence on the present and the future. This makes historical data a good source for predicting future prices of instruments.
Let’s assume that the main focus is stock trading. First of all, the trader has to figure out which instruments interest him and download and prepare the respective historical data in a time series format. Next, the trader should choose a benchmark, so that he can compare the algorithm results with its performance. Let’s use the S&P 500 for example. We should set a time frame for which we would like to analyze the performance of the index. Let’s take data for the 3-year period from January 2016 to January 2019.
- Constructing the algorithm
The main goal is to build an algorithm that is capable of forecasting prices’ trajectories. A good way to achieve that is to aim for two main factors — signal and predictability. The idea of the first one is to represent what the expected movement is — whether it is a price increase or a price decrease. The predictability factor is intended to reveal the correlation between the past predictions of the algorithm and the real movement of each of the observed assets. Or in other words, to show how confident the results for the signal are. For more accurate results, we will use a Pearson correlation coefficient.
The evaluations are performed on individual stock level and then averaged. The profitability of each trade is calculated via the following formula:
For easier calculation, this can be performed in Excel. The result from the abovementioned formula shows what the expected profitability is if an investor buys the instrument on the day of the prediction and sells it after 1 month.
This paves the way for finding which are the best instruments to trade from the whole index. The next step is to instruct the algorithm to take the average of all predictions and weigh them accordingly (recent performances usually receive bigger weights).
end price / price on the day of the prediction — 1
- Interpreting the results
After the algorithm is done with its predictions, the trader should then filter the most predictable instruments in the list and choose those with the highest signal strengths. That way, he can easily find out which stocks are most likely to experience a price movement and trade on the results.
Next, the trader should compare the results from the top performing stocks with the respective benchmark and make an investment decision.
As can be seen, the results indicate that the algorithm outperforms the S&P 500 index’s return across all investment horizons.
Alternative prediction methodologies
In reality, there are plenty of other ways to conduct stock market predictions via machine learning algorithms. This is where I got started. One of the widely preferred and efficient ways is called “ensemble learning”. The idea behind it is to employ the power of multiple learning algorithms to increase the overall accuracy of the final prediction. Ensemble learning techniques are often preferred due to the fact that the learning and the algorithm training process happen independently. After they are finalized, the generated predictions are combined in one, generalized analysis. Two of the most popular algorithms that are often combined via ensemble learning are neural networks and support vector machines.
Predicting stock prices a decade ago was an extensive and time-consuming process. Today, the power of machine learning algorithms helps us save time and efforts, while at the same time achieves better performance and higher efficiency. However, the technology still has a long way to go, until it becomes fully capable of solving the mystery of financial markets. Yet, it is good to know that if needed, you can always replace your personal financial consultant with an algorithm. At least you will be sure that all it will do is advice you and not try to sell you anything.
Eager to turn data into insightful information but unsure where to start? This is where I got started
 Chaotic structures — also known as “dynamic systems”, these structures are prone to changing their behavior should there be even a slight change in their initial conditions. This makes them fragile and very hard to explore. However, with the power of machine and deep learning, the process of finding hidden laws and patterns within dynamic structures constantly evolves.
 Pearson correlation coefficient — a measure of the linear correlation between two variables. For the abovementioned example, the variables are the past algorithmic performance and the actual market movement measured from -1 to 1
 Support-vector machines — supervised learning models intended for classification and regression analysis. An SVM algorithmic model visualizes the results, represented as points in space.
This article is originally published on: https://www.datadriveninvestor.com/