AlphaGo for Securities Trading

AlphaGo is Google’s Deep Learning system that was awarded professional 9-dan (the highest possible Go ranking) by the Chinese Weiqi Association. It began learning how to play the game of Go through interaction with human players. The latest version skipped that step and had the machine play itself to learn — starting from complete random play. This method was able to beat the previous version in every game played.

This article outlines the initial development of creating a similar system for securities trading. It is unrealistic to expect results anywhere near AlphaGo. Instead, this represents a first attempt of a system inspired by their success.


It appears that the price of a security is more influenced by noise (media coverage of that stock) more so than the underlying companies performance. At times, the crowd catches onto this and the price moves to reflect a better correlation between the stock price and the companies financials. Other times it does not.

As such, this phenomenon is probably due to the fact that financial reports from a company are only published every 3 months whereas news can be updated each hour. Regardless, the first part of building the system is focused on creating a deep learning value based strategy.

Using the financials from Quarterly SEC Filings (10-Q) on EDGAR as training data, it is hoped that the system will be able to determine how far a stock’s pricing is from that expected as based upon published company fundamentals and excluding all noise from the decision.

For example, if ZZZ stock is currently trading at $50 per share, is that above or below what one would expect if they had knowledge of the financials for the company. In other words, is this stock above or below the fundamental expectation.

To clarify, perhaps ZZZ stock is at $50 per share from an annual high of $75. The total dollar amount of assets and total liabilities has improved over that time. Yet, the $50 per share price remains most-likely due to an unflattering article and social commentary on that company. Based on these fundamentals, ZZZ is trading below the expectation and a value investor would consider taking a long position.

If one was building a logic based system for value investing, it would most-likely mimic similar logic. However, a deep learning system is free to come to it’s own conclusions. Making the most anticipated aspect, the witnessing of the strategies the computer creates.

As stated by Carlos Perez, author of Artificial Intuition and The Deep Learning Playbook states, “In Deep Learning like in sports, you can’t win on paper, you actually have to play the game to see who wins. In short, no matter a simple an idea may be, you just never know how well it will work unless the experiments are actually run.”

Training Data

The latest version of AlphaGo knew nothing about the rules of Go, using only the black and white stones as input. In contrast, the first iteration of the trading system will use a sampling of SEC 10-Q Reports from quarter 1 of 2018. It is parsed into the following format:

Training Data Example

Each symbol must be converted to a numeric value. To accomplish this, the Quaternary representation of the letter’s real number value is used. Converting A with a real value of 1 to Z with a real value of 26. This base-4 numeral system uses the four values of 0, 1, 2, and 3 to represent a real number. In the example above 11100 is AAP and 3030 is LL.

The system will then trade using an API test account from an online broker with $50,000 of initial funds and run for 7 days — freely making trades and using the results as input. Hopefully, it will discover successful strategies that do not regress towards the mean over time.

Recurrent Neural Network

This trading system requires a deep neural network. This is an algorithm that clusters and classifies data. The classification part requires labeled datasets and is considered supervised learning. While clustering is the detection of similarities within unlabeled data and referred to as unsupervised learning.

The description within each data example is just for clarity — actual data used is unlabeled. Therefore, this system relies on unsupervised learning — with the goal of creating new strategies from the data.

The issue with developing machine learning in the past has been the machine must learn from scratch each time. That is fine for simple tasks, but for more complex applications like playing the game of Go or trading stock, a different approach is required.

The fundamental company data and stock price may or may not be correlated. In many observed cases, it is not. However, Warren Buffett has found that at times the two become so and as a result, large profits are realized. The key to value investing, it seems, is discovering the moment before such a phenomenon occurs.

To accomplish this, a recurrent neural network (RNN) is required. These are neural networks with loops that provide persistence (storage) of the information learned. For example, consider the diagram below. Here, a part of a neural network, A, looks at some input Xt and outputs a value ht. A loop allows information to be passed from one step of the network to the next.

RNNs are made from multiple copies of the same network, each passing a message to it’s successor. When unrolled, as in the diagram below, it illustrates this fact more clearly while revealing that RNNs are heavily related to sequences.

RNNs have recently found great success in handwriting analysis, speech recognition, and image captioning among others. So how do they work?

According to Andrej Karpathy in his blog post, The Unreasonable Effectiveness of Recurrent Neural Networks, “At the core, RNNs have a deceptively simple API: They accept an input vector x and give you an output vector y. However, crucially this output vector’s contents are influenced not only by the input you just fed in, but also on the entire history of inputs you’ve fed in in the past.”

Karpathy continues with, “The core reason that recurrent nets are more exciting is that they allow us to operate over sequences of vectors: Sequences in the input, the output, or in the most general case both.”

A sequence of vectors is what financial data is — be it market prices or fundamental analysis. Everything a deep learning financial system uses is based upon a time series of input and a sequential output of results. Thus, RNNs appears to be perfect for this use case.

Reinforced Learning

The key to AlphaGo is Reinforced Learning (RL). This is a means to guide behavior based upon a long-term reward. Providing the machine the mechanisms to exhibit optimized behavior within a given context. According to the Google Brain team, “[AlphaGo] learns what is the best strategy given the current position on the game board. Then it applies reinforcement learning by setting up self-play games. The RL policy network gets improved when it wins more and more games against previous versions of the policy network.”

AlphaGo has an arguably easier task with playing Go than a machine trading securities. This is due to the fact that Go has only two players. Whereas, security markets have multiple direct and indirect players that effect success, making it difficult to learn from each trade. In addition, there is the issue of near randomness within market prices.

To try and move away from random events, the system is designed to learn from fundamental data over market noise. It does not try to predict prices. Instead, the computer learns to predict market direction within a longer timeframe of one to seven days.

Network Model

AlphaGo Zero, the latest version at the time of this writing, is based on a single network. A technique used by Deep Mind for a deep learning system designed to play Atari games. This architecture allows the machine to teach itself from scratch and has proven better for game strategy than other techniques that include human teaching. The spooky part to machine self-learning is an inability for humans to fully comprehend the strategies produced.

The results of AlphaGo and similar deep learning systems are remarkable. However, some conclude that these machines may have simply discovered an unknown heuristic that a researcher is unaware of. Perhaps that is true, even so, it is still a good use case.

First, there is a Long Short-Term Memory (LSTM) network. This is a type of recurrent neural network (RNN) that allow learning of many steps, creating a channel to link cause and effect remotely. It makes decisions about what to store, and when to allow reads, writes and deletions, via gates that open and close. They do this by passing or blocking information based upon its importance and strength. Filtering by their own set of weights as illustrated below:

In other words, LSTM networks filter information based upon what they learn to be most meaningful. Human traders do the same thing. They read news articles, check financial data, and watch for patterns in the prices. Using the information to formulate a strategy. As one progresses as a securities trader, the sources of data used become fewer and more specialized. This is the hope of the trading system.

System Design

The trading system minimal viable product (MVP) is based upon two components: a trade interface and the LSTM (not LTSM as depicted) network as conceptualized on the virtual equivalent of a napkin below.

While hard to read, the diagram above illustrated the trade interface obtaining balance and stock data from the broker account. It is controlled by the LSTM network that is trained on EDGAR data from the SEC. What is not clear, is the fact that the LSTM only communicates with the trade interface — requesting specific stock symbols and making order requests.

The trade interface is just an API wrapper between the LSTM and the broker account so that the broker can be changed out without having to modify the LSTM. Furthermore, the system does not close trades.

The trades close based upon order type made by the LSTM. To clarify, this means either special orders like stop loss or reverse position orders such as a short position closing a long one. Otherwise, the trades remain open.

When building an AI based arbitrage system capable of making self-guided trades, an additional service was created to close them. Currently in production, there are two monitoring services. One to find new trading opportunities and the other to close open trades at what it deems the most profitable point.

Trying to encapsulate the success of AlphaGo for trading, this new trading system is much different. The desire is to have all parts self-contained into one network capable of making orders and closing positions on its own. However, for the initial test, closing trades requires either human intervention or special trade types the machine will need to discover.

The specifics on the model’s architecture is a work in progress and more will be shared once the details are concluded.


Inspired by AlphaGo, this is an overview of a new type of trading system. Finding success in a current, more restrictive system, the hope is to create a new trading system with more decision making freedom. If for nothing else than to discover novel new strategies and lay the ground work for future improvements.

This first attempt is written in R using the Google TensorFlow framework with Keras library. As parts are completed, the successes and failures will be shared. Once the system is complete, all code will be made available as open source on GitHub. Thank you for reading.

Like what you read? Give Todd Moses a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.