DamusS-A Stock Market Predictor

8 min readMay 8, 2022

Description of The Problem

With the advancement of machine learning and deep learning models and algorithms, so many applications come to mind for implementing, using these techniques. Machine learning and deep learning algorithms enable the computer to learn the behavior of a repetitive feature and predict or recognize it for future use. Learning based on labeled data is referred to as supervised learning. This means that if we have enough data gathered of what we want the model to learn, we can obtain an accurate system for learning the pattern and predicting based on that pattern.

There is a huge interest in the stock market, more so because people would like to know the future trending behavior of the stocks. The reasoning is obvious, it would be very nice to know if a market is going to rise or fall! So, can we have an intelligent model that can predict the future trend of a stock? Not only is there enough data available, it is well labeled, the historical data is available for evaluation, and it is text based, making it easy to process. Thus, it makes sense to be able to use a simple system that will solve this problem of predicting for us. However we have to think of another aspect of it too, which we will discuss in the next paragraph.

Usually when we start to build up an application, first we want to have a prototype of some sort to see how well it works, or to check if it is even feasible to implement such an application. As we all know, if you do not look into the future our lives can become very difficult very fast. If the application that we are making works, we want to be able to show it to people, distribute it, or even sell it. This means that it will need to meet some certain necessary requirements. First of all it should have a GUI (Graphical User Interface) for user interactions. Without that, even if it solves a very huge and interesting problem, people would not like to try it out. Second, what if many people want to use this application? For example in our case, it should be very well expandable and scalable. Furthermore, since we want to run it as a web service, it should have a client which is the user, and a server which is the application itself. It is always important to think of these types of requirements in advance and make a basic infrastructure that is easily accessible and adjustable for future developments.

If we were to summarize the whole problem in a paragraph, we need a cloud based AI enabled application that can predict the stock market, interactable by a simple GUI.

Application Architecture

Before starting the implementation of the application, drawing an abstract architecture helps with better understanding the problem and future decisions. The following image shows the architecture that we first came up with:

The old architecture — The first architecture that we designed. We have changed it in the progress of development.

We wanted a Client (written in GO programming language), a Server (Written in GO language), an AI engine (written in python), and to use a stock market API to receive the necessary data. The Client would ask from the Server whether a certain stock is going to rise or fall. This would then lead the Server, based on the Client’s question, to reach out to the stock market API to request the data needed. After receiving the data, it would send the data to the AI engine, and the AI engine would make a decision based on that data. The decision would then be sent back to the Server, which eventually tells the client what they should do.

This was a good starting point for us. However, after some development, we have realized that by minor changes we would have a better application that is faster and more user friendly. Thus we changed the architecture to the following:

The new architecture — The second architecture that we defined when developing the program.

First of all we changed the client to Browser. It would show up much nicer, and with the help of some web programming we would have a simple but effective GUI. Second, we moved the API call from the server side to the AI engine side. This decision had two reasons. It would save time of sending too much data from the server to the AI engine and back again, enabling the system to be faster and have lower latency. The other reason is that it would be much easier to implement the system this way. Thus the final architecture became the second one. Now it is the time to look at some more of the details and algorithms that we used to implement the application. In the next section we will take a look at the flow of the program discussing more details and design decisions. The following paragraphs are more technical.

Implementation Detail

Since the whole application is built based on micro-service architecture. The first issue we faced is how the different servers would interact with each other. We decided to go with the gRPC method. gRPC is a modern open source protocol. It is easy to use and has good scalability. Also the reason for choosing it is because it is bi-directional, has a binary format, and works across different programming languages. The server is written in Go programming language. At first the response of the server was in plain text. However, our client accesses our server through a browser, which is a clean and nice looking HTML page. This is an important aspect for attracting users. There is a Go package called HTML template that allows us to load HTML, CSS and JS file. Thus we made our own template HTML page. The server loads the template page whenever the client sends a request to the server.

For the AI engine, we needed a light accurate model to be able to be trained on the fly and predict the price for tomorrow. As a result, we chose a multilayer perceptron regressor. This model is trained on extracted features from five years of data provided by Yahoo Finance API for 1000 iterations using Adam optimizer every time a client requests data for a specific stock. Since the model is fairly light, the training takes around 2 seconds and the prediction is done in the matter of milliseconds. After the prediction, the model compares the predicted value and today’s price and based on that it gives a suggestion whether to sell or buy that specific stock.

The whole flow of the program can be seen in the following figure:

Challenges

When writing a program or an application, there will always be challenges along the way. A number of them are shared between all of the applications. Some examples of the challenges include how to connect two components together, how we should have version control, who works on which part, or even problems like finding the errors in the code that keep them from running. Another type of challenge on the other hand is context specific challenges. This means that there are challenges that only happen because of the type of the problem you are trying to solve or the application you are trying to develop. In this section we discuss a few challenges and potential solutions.

The first challenge that we faced was that our deep learning model did not show a good accuracy. We checked everything, from the preprocessing of the data to the training itself, and even the way we show the results. Later we realized that we were training on the history of the past five years based on the name of the stock market. However, the behavior of the market has been vastly affected by the COVID-19 pandemic. The behavior change will simply be denoted as A or B, for a simple understanding that there was a difference observed. If the market showed behavior A before COVID-19, the behavior of the market during the pandemic was B, and now it is A again. Thus we must ignore the time interval that the COVID-19 happened and train based on other dates. This way we remove the bias that it adds to the system and our model will work properly.

The other challenge was the latency of the system, which is very important when we want to have an end to end system that people use. Since we were not going to keep the data, and we were training on the fly, we needed a good model that trains fast to run almost in real-time. However, there were various solutions to this problem. We could implement a dataset keeping record of the training of each individual input and then do fine tuning based on later inputs, but we wanted to keep the system as simple as possible as well as for it to work fast even if there was a new input. Thus, having a better model seemed to be a wiser choice.

Another challenge that we have, and we did not solve in our system, is that it does not consider real-world and real-time happenings. For example, it does not consider if a new technology has been introduced just today, so a specific stock market might fall because of that. It purely depends on historical data, which most of the time is more than enough. However, there are times that real-world events affect the market and it could lead up to catastrophic results if one just relies on this system for trading. In other words, we are not sure how effective and reliable our system is from the perspective of normal and general use. Thus, please keep in mind that we do not recommend this system as a trading advisor. It can be of help, but keep in mind that you are using it with your own responsibility.

Another important aspect is that our system only works with markets that are more than 5 years old. This is because the model is expecting to see 5 years of data for training. This is a very easy fix which we could add to the system.

Testing and Application

Here you can find how the system works. In the first page you query the ticker symbol of a stock market. In the next page you see both the past 50 days behavior and the prediction for the day after the day you actually searched. This is for now very simple but it can easily be improved.

Future Extension

The future plan is to containerize the servers and deploy them on the Kubernetes. This will allow the team to easily monitor the program flow, and scale up the service. Also, this will allow us to improve the program or fix bugs without interrupting the server. Another direction of improvement is to add a database to the system to be able to cache the trained model and improve the latency of the system training once for each stock.

Code

The code is available at: https://github.com/adaneshp/DAMUSS-Stock-Market-Predictor

Team Members

Armin Danesh Pazho, Ghazal Alinezhad Noghre, Yinfei Li, and Rosa Vergara