Using LSTM to Implement a Sector Rotation Trading Strategy Part I

Christopher Riggio
Analytics Vidhya
Published in
4 min readSep 19, 2019

For my final project at the Flatiron School for Data Science I wanted to forecast stock data using time-series analysis. However, after doing some research the general consensus seemed to be that traditional time-series models such as ARIMA often produce extremely inaccurate results and are not well suited for handling stock data. ARIMA models are better equipped to process data that shows stationarity and seasonality. PATH train ridership data is a good example of this, a steady upwards trend in ridership can be observed over the last decade with a drop over the summer and a spike in the fall each year. Stock data on the other hand not stationary or seasonal.

This eventually lead me to implementing a LSTM (Long Short-Term Memory) network for this particular analysis. A LSTM network is a type of recurrent neural network that is often used in things like speech, handwriting and image recognition but is also capable of producing powerful time-series models as well. It’s trained by using Back-propagation Through Time (BPTT), which updates network weights to minimize error through the use of the derivative chain rule. In addition to this it mostly takes care of the vanishing gradient problem, an issue that can often occur in recurrent neural networks, in which smaller and smaller gradients cause the weights to never change.

So how does this work exactly?

LSTM networks are extremely complex but I’m going to take a stab at explaining it the best I can with a little help from Christopher Olah’s blog, check it out for a deeper dive on the inner working of LSTMs and all things machine learning.

Unrolled Vanilla Recurrent Neural Network

A quick note on RNNs….

Recurrent networks are very good at learning sequential data, they have three main components, the input data, the hidden layer and your output layer, denoted as x, A and h above. As you continue to loop though the RNN it not only receives the next piece of input data but also the hidden layer from the previous time step, thus continuing to learn from past data points as it moves through the entire sequence. However, as I mentioned above, they are often plagued by the vanishing gradient problem, that’s where LSTMs come in. That’s really the most simplistic way to explain it without getting into the calculus and linear algebra going on behind the scenes.

There are five essential components to LSTMs that allows it to process and model both short-term and long-term data.

  1. Cell state (ct)— Representing the internal memory where both short-term and long-term memory is stored.
  2. Hidden state(ht) —Essentially this state decides whether to retrieve short-term memory, long-term memory or both which is stored in the cell state.
  3. Input gate (it)— This gate decides how much information flows into the cell state from the current input.
  4. Forget gate (ft)— This gate decides how much information flows into the cell state from both the current input and previous cell.
  5. Output gate (ot)— Decides how much information flows into the hidden state from the current cell state. This is so the LSTM can pick only short-term, long-term or both types of memory if needed.
https://www.datacamp.com/community/tutorials/lstm-python-stock-market?source=post_page

Each gate has its own series of matrix operations and weights which are continually updated and allow the network to learn over time. The cell state stores all of the memory over time, while the hidden state can be thought of as a place for the current or working memory to be stored. So how does this network of cells know what to remember and what to forget? In short, this is done through the use of gradient descent. Gradient descent could easily be it’s own blog topic but in short it’s an optimization algorithm that is used to minimize a function by iteratively moving in the direction of the steepest descent where cost is the the mean squared error. In respect to LSTMs gradient descent is used to update the weights of our model. For more on gradient descent check out this article.

One of the main takeaways is that LSTMs and neural networks in general are loosely modeled after the human brain. We don’t just throw everything away and start thinking from scratch every time we’re presented with a new piece of information. Each word you read or piece of knowledge you acquire is based on your understanding of things that came before it. Does that even make sense? I think so at least.

In the next part of this blog series I will actually start to get into the meat of this project. I will talk about pulling stock data off of Yahoo Finance for each of the nine sectors of the S&P 500, what a sector rotation strategy is and how I used a LSTM network to implement it and why using stock data for machine learning purposes is sometime thought of as a controversial topic. I wanted to provide at least a loose understanding of LSTMs before jumping in.

--

--