1. Motivation and Background
Portfolio Management is the process of maximizing the return on a portfolio. Portfolio managers make trading decisions on behalf of their clients depending on their appetite for risk. They analyze different assets, their strengths and weaknesses before making a decision about which equities they should hold in a portfolio for balancing the risk and drawing maximum returns. This makes portfolio management a difficult process. We aim to make this process better and simpler by using predictive modeling and deep learning techniques. We generate stable portfolios on predicted stock prices for next quarter.
In past a lot of attempts have been made to predict the stock prices for firms, and also to understand how text of news articles, twitter posts, and other platforms influence stock prices. These attempts have involved analysing the impact of sentiments of the mentioned resources and predicting value stock prices. What still remains unexplored is the effect of sentiments of financial reports on these stock markets. Harvard Business School published a working paper, ‘Lazy Prices’ claiming that these financial reports have influence in a firm’s stock price. They stated that a drastic change in similarity (Jaccard and Cosine scores) of these filings can increase or decrease a firm’s market value .
2. Problem Statement
The project required us to first explore finance domain and gather relevant information about stock price movement, trading methods and portfolio management techniques. We understood how impactful data science has been in finance so far and came up with three main problems we wanted to focus -
- To what extent are the quarterly and annual financial reports filed by firms influencing their stock prices?
- How does including the sentiments of historical financial reports change prediction of stock prices?
- How can we built a stable portfolio using the predicted values for a quarter?
These problems are challenging because we had to do a lot of background research on domain. Using right features and doing proper feature engineering to come up with best predictions had a significant learning curve. In addition to that, the data source and format for these filings is not consistent and therefore scraping and parsing the data was a big challenge. There are two different data sources involved here — one is data from financial reports and other is stock price values. In order to prepare dataset for analysis and to train our prediction model we had to look into various methods to integrate these two sources. These datasets are not a direct match between each other as OHLC data is released daily while SEC filings are released every quarter. To merge them together, we had to carefully match the date ranges and thoughtfully engineer features to map predictions for the next quarter.
3. Data Science Pipeline
3.1 Data Collection
SEC 10-Q and 10-K filings from EDGAR
The very first step in our data pipeline was to scrape SEC Edgar (Electronic Data Gathering, Analysis, and Retrieval) database. This is an online database maintained by the U.S. Securities and Exchange Commission (the “SEC”) to track all SEC filings by public companies and now contains over 12 million such filings.
Since EDGAR doesn’t support any filtering options other than company tickers and central index keys (CIK) at the moment of writing this, we had to scrape all 10-K and 10-Q filings for the S&P 500  companies, regardless of the time periods that we were interested in.
We collected around 60 GB of this data for S&P 500 tickers from the database. Since EDGAR limits the number of requests per user to 10 per second, we had to add some extra primitives to accommodate for this limitation. Furthermore, given the huge size of dataset, we maintained checkpoints for each downloaded ticker in order to allow resuming at later time in case of failures and avoid any data loss.
OHLC data from Quandl API
OHLC (open-high-low-close) is a type of dataset that provides open, high, low, and close prices for a stock for each and every business day. This dataset was one of the most crucial information used to train our models in later parts of our pipeline. We leveraged Stocker , a Python abstraction on Quandl API, to retrieve this OHLC data for each S&P 500 company.
3.2 Data Preparation
SEC 10-Q and 10-K filings from EDGAR
We found that the scraped 10-K and 10-Q SEC filings were highly unstructured as it contains HTML tags, symbols and numerical tables. We cleaned these files by removing unrelated data and then used them to parse HTML into clean text data which was then used to analyse the sentiments. For cleaning and parsing, we used regex, NumPy and BeautifulSoup.
As part of data preparation, we also generated CikList list mapping files for 10-K and 10-Q filings for each CIK which contained mappings for dates, SEC types, filenames, CIKs and tickers. Date field was the filing date of the SEC form; SEC type was 10-Q or 10-K, and the filename was the SEC file.
Each CikList mapping file had approximately 12 rows for 10-K and 34 rows for 10-Q SEC files for a 10 year window. We used these mapping files to keep track of SEC files for every CIK. These saved mappings were later used to calculate sentiments for all SEC files corresponding to each CIK.
OHLC data from Quandl API
Python Stocker module makes it relatively easy to get OHLC data for each ticker as a Pandas dataframe. However, since our method makes prediction for the next quarter using a 90 day prior window (reasons described in Methodology section), we had to do some data wrangling before we could feed it to our deep learning models.
We used a 90 day window (a quarter is 90 days) to construct each row as X containing all OHLC data for that period and put Adj. Close as Y. The method we used to prepare this data is visually explained in Figure 1.
3.3 Data Integration
Combining Sentiment Scores with OHLC data
Integrating OHLC data obtained from Quandl API with sentiments from SEC filings based on dates was complicated because of the difference in SEC filing dates and OHLC quarterly dates for the stocks.
For OHLC data, quarterly stock returns were in months 3rd, 6th, 9th and 12th whereas, for SECs, quarterly data was for months 2nd, 5th, 7th and 10th.
Thus, to resolve this issue, we used Panda’s “forward” merge method which mapped/joined the sentiment scores for the SEC filing for one quarter with the stock’s price data for the next nearest date. The purpose behind this mapping was to analyse how were the closing prices of tickers affected following the release of a financial report. We have shown first 5 rows of the joined dataframe in Figure 2.
Standardizing OHLC data for S&P 500 stocks
We only took 10 years of stock data for the S&P 500 companies. The reason for limiting this data to a recent time window is that it is widely believed that financial data is ‘non-stationary’ and prone to regime changes which might render old data less relevant for prediction [4, 5]. Therefore, we defined this period to be in the range of 1-Jan-2008 to 31-Dec-2018.
After retrieving this data and limiting it within a 10 year period, we found out that a few companies went public after 1st Jan 2008 and they had to be filtered out to maintain consistency. This filtering reduced the dataset to around 300 tickers. Moreover, there were some cases where few rows of OHLC data were missing for certain tickers. Further investigation didn’t reveal any particular event in those companies’ history so it could have been some bug in Quandl API. To fix this inconsistency, we used Pandas ‘interpolate’ method to construct those missing rows.
We also used fastai library to convert dates into features such as ‘day of week’, ‘month of year’ and ‘week’ in order to allow the model to learn if there were any temporal relations in stock prices.
Sentiment Analysis Using NLTK VADER
We used NLTK VADER (Valence Aware Dictionary and sEntiment Reasoner) sentiment analyser which is a lexicon and rule-based sentiment analysis tool. In this approach, each of the words in the lexicon is rated as to whether it is positive, negative or neutral and scores are calculated based on how positive, negative or neutral a sentiment is.
Vader lexicons do not have the financial dictionaries, so we updated it with financial dictionaries from Loughran-McDonald Financial Sentiment Word Lists [6, 7].
Since the MD&A section of SEC filings focuses on the historical description of the financial indicators from the management’s perspective, if processed correctly, this section can help investors give a better prediction for a company’s future return. In the beginning, we decided to focus on the MD&A section and tried to fetch the content from that section using regex, but due to the disorganised structure of SEC filings, we failed to achieve it.
Instead, we obtained positive, negative and neutral sentiment scores for the entire sections of each SEC filings (approx. 4600 SEC files). VADER is quite slow with large texts so we split the content into batches of 2000 words which resulted in reduced sentiment calculation time. For each of these batches, we calculated the positive, negative and neutral sentiment scores and took the mean, which was the final sentiment scores for each SEC filing.
Stock Price Prediction
Initial ideas, failures and success
As mentioned in section 3.3, we had around 300 tickers to work with, given the constraints that we have described in the earlier section. We started training a single model using all 300 tickers by feature engineering them as follows:
- Constructed pandas dataframes for each stock using a 90 day window as described in Figure 1.
- Fit separate scalers on whole data set for each stock to scale values between 0 and 1. Fitting separate scalers was essential because each stock can have different minimum and maximum prices.
- These scalers were then used to transform the respective train, validation and test dataset for each of these stocks.
- After all these transformations we used Keras TimeseriesGenerator to cascade OHLC data for all stocks as described in Figure 3.
Due to constraints on hardware resources, we failed to train a single model using all the 300 stocks data and had to eventually limit it to the first 50 stocks.
LSTM models have been studied well and proven to be very effective on time-series data. We compiled neural network models using Keras with two LSTM layers, two dropout layers and a dense layer for the output.
We obtained very poor results after training a single model over data from all 50 stocks (constructed as described in Figure 3). To investigate it further, we also trained separate models for each of the 50 stocks without the cascading method. Results obtained using the latter proved much more accurate with trend lines that were reflective of the real trends for those stock. Each of these 50 models were trained for 20 epochs and without sentiment scores. Results obtained using this method are discussed in Section 5.
Training with sentiments
Sentiment scores can only be calculated for each quarter, mapping to three 10-Q and one 10-K filing for each financial year considered for a company. Therefore, they needed to be mapped to the daily OHLC data for each stock. We achieved this by considering 90 day OHLC data as one quarter and replicating prior quarter’s sentiment scores for all the rows. Separate models, with exactly same architecture as described above, were trained for 20 epochs for each of the 50 stocks. First training was done with positive sentiment scores and the second was done with negative. Results are discussed in the later section.
Predicting Stable Portfolios
We decided to create stable portfolios because we want to hold the portfolio for the predicted next quarter. The strategy we follow in order to create stable portfolios is to keep uncorrelated stocks in same portfolio. We do this to avoid any huge loss in a portfolio, because if one stock goes down the others will balance the loss. For example: If a portfolio has 2 stocks [AAPL, FB] in case they are correlated, if AAPL falls this would mean FB also falls and we get overall loss in portfolio. But if the stocks were uncorrelated, AAPL and FB will not fall together and thus prevent any heavy loss. This creates stable portfolios with balanced risk on which continuous earnings can be drawn for the quarter.
We started with evaluating the correlation and covariance of each pair combination of stocks, and kept the pairs with weak correlation in same set. We considered a correlation value less than 0.5, and a covariance value less than mean covariance. Then for each set we checked which other stock can be added based on low correlation/covariance values.
The Sharpe ratio is calculated by subtracting the risk-free rate from the return of the portfolio and dividing that result by the standard deviation of the portfolio’s excess return.
A higher value of Sharpe ratio means better risk-adjusted return. A sharpe ratio greater than 1 is considered good, greater than 2 is considered very good and greater than 3 is considered excellent. We considered risk-free rate of 2% i.e. current risk free rate of US market. We calculated the best sharpe ratio value for each portfolio among randomly selected 200 weights. And based on the Sharpe Ratio obtained we divided our portfolios as good, better and best.
Evaluating the effect of SEC sentiments on the stock prices
We mapped quarterly OHLC data with the quarterly sentiment scores to analyze how positive and negative sentiments are influencing the closing prices.
From Figure 4, it can be seen that when positive sentiment increased, GWW’s closing price in the following quarter was trailing the trend and going high. And the opposite trend had been noticed in case of negative sentiments in Figure 5.
We were able to see the positive and negative trend between the stock’s closing price and the SEC sentiment scores. It showed that the sentiment scores of the previous quarter had an impact on the stock’s closing price of the next quarter.
Since stock prices depend on various other factors as well, we were not able to see the relation between sentiments and stock closing price for each quarter.
Evaluation on stock predictions
We trained each of our 50 models for three cases; no sentiments, positive sentiments as features and negative sentiments as features. Results for one of the stocks (W W Grainger Inc) for all three cases is shown in Table 1.
Although we obtained promising results for the GWW stock, this wasn’t always the case and including sentiments for some of the tickers had little to no effect on the RMSE. This suggests that sentiments from SEC filings are not the only factors affecting stock prices and there can be other potential factors that can be investigated. For example, a sudden announcement of change in energy policy can positively or negatively impact clean energy companies and this change can reflect for the next quarter. If such changes were never addressed in a prior SEC filing, a trained model won’t be able to pick that information from just the sentiment scores of that SEC filing. Improvements and suggestions to circumvent this are discussed in the Future Work section.
Plots in Figure 6 show Adj. Closing prices for GWW. Green line shows the historical stock performance over a period of 10 years (model training was done using data from this time period). Blue line shows predicted trend without sentiments as features and yellow line shows predicted trend after including negative sentiment scores.
A magnified view over the test window is shown in Figure 7 and includes actual and predicted trends. Note that although both predicted trends are similar, one that includes negative sentiments appears closer to the real values. This is confirmed by the lower RMSE given in Table 1.
Evaluation on Portfolio Optimization
We analyzed the correlation between 50 stocks using python colormaps. Plot in Figure 8 shows the colormap obtained:
Blue color is relative to minimum correlation and red is relative to highest. This means that the stocks towards blue can be in the same portfolio. For example: [sbac, msci]
After getting the stock pairs that have correlation less than 0.5 and covariance less than mean covariance, we analyzed the “pairable” stocks and “unpairable” stocks using the plot in Figure 9:
The plot shows pairable stocks in blue and unpairable stocks in green. For Example: [amgn, mat] qualify to be in same portfolio based on their covariance and correlation values. We generated portfolios based on the above plot and calculated Sharpe Ratio and weight distribution for each portfolio. First few rows of the final dataframe is shown for reference:
Finally, based on Sharpe Ratio, portfolios were divided into good, better and best. The following plots show the Return and Volatility values of Portfolios in “good”, “better” and “best” range. These plots are useful to evaluate the trade-offs between Volatility, Return and Risk and select a portfolio.
Plot in Figure 11 is generated for “good” portfolios. An example of trade-off by looking at the plot below is [ma, zion, mat, vno, sna, flir, aee, duk, etr, pnr, akam] and [ndaq, mat]. It can be seen that latter gives a better return of 20% with similar risk of 1.7 Sharpe Ratio.
Plot in Figure 12 is for portfolios with Sharpe Ratio >=2 and < 3 (better). Greater sharpe ratio here means that the portfolios are less risky compared to “good” ones. As above, similar interesting trade-offs can be derived in this set as well. The plot shows one such trade-off between [ndaq,flir] and [wfc,rtn] choosing either gives almost the same return but latter is less riskier because of higher Sharpe Ratio.
Similar conclusions can be drawn from “best” Sharpe ratio range (Figure 13) as well. This set of portfolios have minimum risk involved.
6. Data Product
Our final product is a collection of Jupyter Notebooks. We have divided the product into four modules: SEC Scraper, Sentiment Analyzer, Stock Predictor and Portfolio Generator.
SEC Scraper: This module scrapes SEC Edgar website to extract 10-Q and 10-K filings for S&P 500 companies. Implements checkpointing so failed downloads can resume later. It also performs all the cleaning on HTML and generates raw text files out of them.
Sentiment Analyzer: This module contains two notebooks.
- Sentiment Analysis Notebook: Extracts scores for positive, negative and neutral sentiments from SEC files provided in a directory. It saves these extracted sentiments into CSV files which can later be used for model training, visualizations, etc.
- Sentiment Visualization Notebook: Generic notebook to plot sentiment trends together with closing prices for a stock.
Stock Predictor: This module contains three notebooks
- LSTM Stocks with no sentiments: Performs feature engineering on stock data and trains and evaluates LSTM models for each of the given stocks. Each model is then saved in its respective directory along with plots comparing actual prices with the predicted ones. Training epochs and all directories are configurable.
- LSTM Stocks with sentiments: Performs feature engineering on stock data as well as sentiments extracted from the Sentiment Analyzer module. It then trains and saves each model along with comparison plots.
- Predictions Generator: This notebook can be used after training to load saved models and perform predictions on test data. Predicted data frames can then be exported to CSV files to be used later for various use cases.
Portfolio Generator and Optimizer: Constructs portfolios using pairwise selection of weakly correlated stocks. Optimizes portfolios to give highest Sharpe Ratio with respect to the optimal weights assigned to stocks in the portfolio.
7. Lessons Learnt
Learning about finance domain to understand what can potentially work was definitely a challenge. Moreover, working on how to manipulate time-series data, windowing methods, and training LSTM models using that proved interesting. Our findings also suggested that training a single model for data from all companies is fruitful because of issues such as correlation and covariance.
Sentiments extracted from SEC filings are significant to predict future stock trends. We learnt to extract sentiments on very large text data (100,000 words in some case) using NTLK VADER.
We learned how to construct portfolios by leveraging concepts such as correlation, covariance, Sharpe ratio and volatility. Relevant visualizations such as colormaps and correlation matrix were very useful in confirming the obtained results.
Machine learning in finance is a relatively recent development and it is hard to find the right resources. Moreover, stock price prediction is usually considered a hard problem and one of the reasons why most portfolio managers do not completely rely on it. We have shown neural networks constructed using LSTMs to be reliable for this problem. Sentiment analysis on SEC filings has also proven to be a reliable technique.With cleaner data and more evaluations using tools described in Section 9, we believe that the results can be improved further. Lastly, portfolio construction is hard and stock markets can be volatile. We have shown a method that can be utilized to mitigate risk and construct diversified portfolios based on predicted results.
9. Future Work
For sentiment analysis, we can consider word-embedding models namely: Word2Vec, FastText and Universal Sentence Encoder. We can evaluate if these models can give us more accurate sentiment scores and evaluate if they can help predict better trends on the stock market.
We extracted only positive, negative and neutral sentiments from SEC filings. There are a lot of other sentiments tied to the financial industry, like uncertainty, litigious, constraining, superfluous, which can be extracted and analyzed further.
Stock Price Prediction
Currently, we have trained separate models for each of the stocks considered. This was done because we obtained very poor results after training a single model using the method described in section 4. However, this method of training separate models for each stock can become unscalable if the number of stocks is large. To address this issue, we can consider training models for a set of highly correlated stocks instead of single stock. This can reduce the number of models trained. Moreover, we can leverage transfer learning to retrain the last layer of such models for every new stock that lies within a certain range of colinearity applying to a model.
We explained earlier that including sentiment scores from SEC filings didn’t necessarily give lower RMSE and better trend lines. To further establish this claim, we can augment our features by including news sentiments for every quarter for each of the respective stocks and evaluate if these scores together with SEC scores will help us make better predictions.
There are a lot of other portfolio construction methods that can be evaluated using our predictions. We can evaluate industry standard techniques such as ‘Global Minimum Variance Portfolio (GMV)’ and ‘Inverse Volatility Portfolio IVP’  and compare those results from real world portfolios for a particular time window.
Finally, portfolio rebalancing is a standard practice for investment firms. We can extend our methods to implement rebalancing of existing portfolios and compare their performance with the ones that we are constructing for each quarter.
Github Repository: https://github.com/mrafayaleem/equity-portfolio-prediction
- Lazy Prices http://laurenhcohen.com/wp-content/uploads/2017/09/lazyprices.pdf
- S&P 500 Companies https://en.wikipedia.org/wiki/List_of_S%26P_500_companies
- Python Stocker https://github.com/WillKoehrsen/Data-Analysis/tree/master/stocker
- Universal features of price formation in financial markets: perspectives from Deep Learning https://arxiv.org/pdf/1803.06917.pdf
- When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks https://www.uts.edu.au/sites/default/files/ADG_Cons2015_Loughran%20McDonald%20JE%202011.pdf
- Financial lexicon from Loughran-McDonald Sentiment Word Lists.https://sraf.nd.edu/textual-analysis/resources/#LM%20Sentiment%20Word%20Lists
- The Most Rewarding Portfolio Construction Techniques https://seekingalpha.com/article/1710142-the-most-rewarding-portfolio-construction-techniques-an-unbiased-evaluation