Foreseeing Armageddon: Could AI have Predicted the Financial Crisis? A Scenario Study using Recurrent Neural Networks


The Global Financial Crisis (hereafter, GFC) of 2007–2008 had far reaching financial and legal consequences, affecting millions of livelihoods. Sparked by the proliferation of subprime mortgages and exemplified by fall of Lehman Brothers, it’s aftershocks were widely felt around the world. Following a period of recovery and growth, the world plunged into the European Sovereign Debt Crisis (hereafter, ESDC) beginning in late 2009, the effects of which have been argued to be still ongoing today. As global markets tend to operate in a cyclical fashion, scenario planning for the next financial crisis is not a matter of if, but when. Indeed, a Google search would lead to hundreds of differing opinions on the matter, from conclusions based on quantitative metrics to others based on the prophecies of Nostrodamus.

Bitcoin serves as a great analogy of the wild ride crises represent.

Analysts and researchers have attempted to anticipate the movement of markets since the days of Benjamin Graham, but the rapid and strong movements observed in crises make predictions difficult. With the advent of AI, new approaches have been publicized claiming significantly returns on investments. While the theoretical representations of a neural networks had been developed even in the early 1990s, the availability of training data and satisfactory computing power resulted in them only gaining widespread use since 2010. Furthermore, the development of models capable of handling sequence inputs commonly seen in stock market data, such as recurrent neural networks, only began to gain attention in 2007 due to their rapid success in Natural Language Processing (NLP).

We’ve covered RNN’s over the past couple of tutorials in both sentiment classification and text generation, and the reader is encouraged to consult those tutorials for detailed discussion on the theoretical aspects of the architecture. Briefly, RNNs are a type of neural network that are designed to handle sequences by propagating information about each previous element in a sequence to make a predictive decision concerning the next element of the sequence.

Because walking in blind is never a great idea.

For this tutorial,we will be using the Kaggle “stock data” dataset to attempt to forecast stock prices. This dataset lists the performance of some of the major banks before, during, and after the financial crisis. We’ve chosen Goldman Sachs as an example here, as the bank suffered extensively during the period for their role providing subprime mortgage-backed securities, resulting in a $5 billion fine.


Our implementation is based on the tutorial by Xavier et al., with the model expanded and augmented with layers to prevent overfitting while increasing it’s capability to capture long-range relationships. As usual, our implementation was run on Google Colaboratory, and can be found on GitHub. Since this will be a scenario study (and as we’ve covered RNNs/LSTMs extensively in the past), we won’t cover every aspect the code in detail, but to summarize:

  • Import the dataset and create a dataframe of the closing price.
  • Normalize our data to lie in between [0,1] using MinMaxScaler.
  • Define the variables affecting length of predictions, in days.
  • Create input data sequences (X and Y) of appropriate lengths, as defined in previous step.
  • Split the data sequences into training, validation, and testing subsections.
  • Build and train the model over 50 epochs.
  • Plot our predictions against the actual historical data.

To test the robustness of our system and better understand its limitations, we devised a series of scenarios where differing amounts of data was available for our model.

  • Scenario A: January 2006-December 2006 (pre-GFC data; 252 datapoints)
  • Scenario B: January 2006-January 2011 (post-GFC, pre-EDSC data;1260 datapoints)

In particular, I’d like to highlight the following section in our code during the data preparation, where define adjust the length of our inputs and output sequences.

look_back = 80 
forward_days = 50
num_periods = 25
division = len(array) — num_periods*forward_days

array_test = array[division-look_back:]
array_train = array[:division]
#Batch data creator

def processData(data, look_back, forward_days,jump=1):
X,Y = [],[]
for i in range(0,len(data) -look_back -forward_days +1, jump):
return np.array(X),np.array(Y)

#Prepare test data
X_test,y_test = processData(array_test,look_back,forward_days,forward_days)
y_test = np.array([list(a.ravel()) for a in y_test])
X,y = processData(array_train,look_back,forward_days)
y = np.array([list(a.ravel()) for a in y])
#Finally lets split the leftover train set into validation as well
from sklearn.model_selection import train_test_split
X_train, X_validate, y_train, y_validate = train_test_split(X, y, test_size=0.15, random_state=42)
#Sanity Check

We first split our dataset into training and testing subsets using the division key variable. The processData() function then takes our input subset and generates sequences for both the input (X) and output (Y) arrays, defined by three key variables.We’ll be adjusting these variables throughout the course of this study, so let’s go over them briefly:

  • Look_back: the input sequence length. A short sequence length may lead to a more reactive model, but one unable of learning long-range relationships in data.
  • Forward_days: defines the length of the predictions. As this increases, the model is unable to correct itself with true test data, and must continue making new predictions based on its previous ones. As a result, any difference in performance becomes significantly more visible.
  • Num_periods: defines the number of periods of length [forward_days] to predict.

With our code introduced, let’s evaluate the results of all of our scenarios.

Scenario A

Figure 1 below displays the predictive results of our network across sequence lengths of 20, 40, and 80 days, respectively, with a prediction period of 50 days.

Figure 1. Prediction results from model trained on pre-GFC data (Jan-Dec 2006)

From our results, it’s clear that feeding the network data exclusive from the pre-GFC period results in an inadequate model unfamiliar with the downturns or swinging markets. Altering the length of the input sequences has no significant effect, as the data the model has only been exposed to a positive bull market, with any downswings being highly restricted in scale. We’ll address this in the scenario below.

Scenario B

So we know a model trained on a pure bull market is useless during a financial crisis, but could we utilize the lessons learned from the Great Financial Crisis to forecast the future crises? We test this theory by extending the training dataset to January 2011, right before the European Sovereign Debt Crisis hit global markets.

Figure 2 below displays the results of our network across sequence lengths of 20, 40, and 80 days, respectively, with a prediction period of 50 days.

Figure 2. Prediction results from model trained on pre-EDSC data (Jan 2006-Jan 2011)

As our model has now experienced both bull and bear markets, our models are able to capture some of these negative sentiments much more accurately, but with relatively poor precision. Moreover, we can finally observe some significant differences between our sequence lengths: longer input sequence lengths appear to give much more measured output, which can be explained by the lack of a strong unidirectional order along the course of their inputs.

Generally– if your model performs poorly with short projection ranges, it’s guaranteed to perform poorly at longer prediction ranges. Any error generated earlier in the output sequence is propagated onwards, leading to increased divergence. For the sake of completeness, let’s extend the prediction period to 125 days with sequence lengths of 40 and 80 days, respectively. Theses plots are shown below in Figure 3:

Figure 3. Prediction results from model trained on pre-EDSC data (Jan 2006-Jan 2011), with 125 day forecast period.

From our results, you can tell that with a longer sequence length, our model attempts to propagate an initial sentiment onward throughout the extended projection, and naturally we still fail to capture that initial drop and the rapid gradient changes that occur.

In technical analysis, indicators such as moving averages are used to aggregate individual fluctuations to better model the market sentiment. As averages, they also provide a smoothing effect on our data, which may improve our modelling performance. We applied a 10-day moving average on our data and ran our network across sequence lengths of 20, 40, and 80 days, respectively, with a prediction period of 50 days.

Figure 3. Prediction results from model trained on pre-EDSC data (Jan 2006-Jan 2011), using moving average data.

The smoothed nature of our moving average data seems to allow for a better fitting by our model, particularly at the shorter sequence lengths. This version of our model may be somewhat useful as a measure of short-term market sentiment. Naturally, as the moving average itself is a lagging indicator, its usefulness in predictive applications may only be limited to sentiment analysis, rather than any precise price forecasting.


So have we cracked the markets? Well, no. If it were easy, we’d all be rich.

Our models certainly aren’t a crystal ball into the future. At best, one may be able to get a sense of where the market may be heading, but without not with any long-term reliability. Note that we haven’t done any significant hyperparameter tuning here, but as they were kept consistent across all models, the comparisons made in this study are valid.

Our results may be due to the over-representation of the positive recovery period before and after the financial crisis, skewing our model to a more positive bias as result. Increasing the amount of data available does appear to improve the capabilities of our model, and perhaps we could further improve upon it by introducing data covering the dotcom-bubble and beyond, but the fact remains that financial crises are difficult to anticipate due to their significant deviance from the normal fluctuations of a stock.

Recently, the use of LSTM networks augmented with attention-based mechanisms have been demonstrated to improve the performance of LSTM-based networks. Attention-based mechanisms work by learning to assign weights to every input within the sequence, representing a difference in relevance of each timestep to the final output. While originally designed to assist NLP models in finding relationships between elements in text, they’ve also been successfully demonstrated in financial series modelling. In our study, they’d be particularly useful for longer sequence inputs, being able to place lesser relevance on earlier timesteps that may be representative of an older trend.

We hope you found this article interesting. To stay up to date with the latest publications, consider following GradientCrescent. Next up, we attempt to build an in-situ code checker for IDEs.