Predicting Stock Performance from Quarterly Earnings Conference Calls

Metis Data Science Bootcamp: Final Project Five

The project detailed in this post was completed as part of my experience at the Metis Data Science Bootcamp. For more information regarding my experience, please see my first blog post.

Introduction

Project Five is the passion project, the final project where all choices are made by the student.

I have an extensive background in finance. I went to business school where I studied finance and began my career with five years in the banking industry. My interest and curiosity keeps my finger on the pulse of the markets and the overall state of the economy. As a tech-savvy data nerd, I also enjoy learning about the latest developments and disruptions in the financial industry.

A couple years ago, I stumbled upon a research paper published by the S&P Global Market Intelligence Quantamental Research team which detailed their work with Natural Language Processing (“NLP”) to “unveil hidden information in earnings calls”. At the time, I was fascinated with their findings but I didn’t have the skills to try the analysis myself. After my training at the Metis Data Science Bootcamp, I was ready.

For my final passion project, I did my own NLP analysis on earnings calls, as detailed in this blog. I presented my work and findings from this project to potential employers at Metis’s Career Day.

Background

Every quarter, public companies must report their earnings. Many companies also host a conference call in order to provide additional commentary and answer questions from Wall Street analysts.

These reports and calls have a strong influence on the stock market. Below is an example of Macy’s stock price for the past five years. The gray bars indicate the timing of Macy’s earning releases. As you can see, on many release days, the stock price could surge (in green) or plummet (in red).

For my project, I extracted features from the transcripts of department store earnings calls with NLP and built a classifier to recommend buy, sell, and hold investment decisions.

Data

The dataset I used consists of 281 earnings call transcripts for JCPenney, Kohl’s, Macy’s, and Nordstrom as detailed below. I was particularly interested in department stores because they’re in a mature industry and experienced disruption/volatility in the past couple years due to changes in consumer preference towards online shopping.

I webscraped the Thomson Reuters transcripts from BamSEC. The transcripts include a prepared remarks section, a question and answer section, and instructions from the call operator. For the purpose of my analysis, I analyzed only the prepared remarks from the company’s representatives.

I took the following steps to pre-process and clean the transcripts:
• Replaced instances of “n’t” with “ not”
• Removed numbers (unless part of hashtag or mention) and punctuation
• Removed capitalization
• Removed stop words (“a”, “we”, “the”, etc), stop phrases (“good morning”), and stop paragraphs (introductory or disclosure statements)
• Lemmatized words

Methodology

Overview

Given the earnings calls transcripts, I extracted features using NLP and then built a Random Forest classifier to make buy, hold, and/or sell investment decisions.

Natural Language Processing

Topic Modeling

Below is an excerpt from Macy’s third quarter earnings call. Using TfidfVectorizer() and Non-Negative Matrix Factorization, I extracted 30 topics in total from all of the calls. For example, three topics shown here are earnings in blue, sales in purple, and website in yellow.

Macy’s Chairman and CEO talks about adjusting their earnings per share guidance at the bottom in blue, sales all throughout his introductory remarks, and in the third paragraph, the update to their website.

Sentiment Analysis

To measure the sentiments of the calls, I calculated the percentage of the transcripts that were:
• Negative
• Positive
• Uncertain
• Litigious
• Constraining
• Interesting
• Modal

I referenced the Loughran and McDonald Master Financial Dictionary as a guide. Their dictionary is often cited as the de facto financial dictionary for NLP analysis. More information about the master dictionary can be found here.

The positive words are highlighted below in green and the negative words are highlighted in red. The Chairman and CEO begins his first paragraph with negative sentiment — Macy’s had a tough quarter, and ends it with positive sentiment — significantly improved margin compression.

Random Forest Classification

Calculating Returns

Buy, sell, and hold signals were assigned in the following manner:
• Buy: one-month return is at least 3%
• Sell or short-sell: one-month return is less than or equal to -3%
• Hold otherwise

To calculate the return, I assumed the stock would be bought at close of business on the day of the earnings release and sold at close of business in 30 days.

Random Forest Classifier

Given the topic and sentiment features, I trained 219 transcripts from the early 2000s to the end of 2015 and made investment decisions on 55 transcripts from the beginning of 2016 using a Random Forest model with n_estimators = 400.

Sample Transcripts in the Training and Testing Data Sets

Results

As a benchmark for comparison, if I bought on every earnings release day in the test set, I would’ve earned a 2.6% return. On the other hand, had I taken the recommendations from the Random Forest model, I would’ve earned a 7.9% return, which is 2x more than my benchmark. The classification model provides value!

Additional Commentary — Quick Hits

I only tuned the Random Forest model, however I may get better results if I try other classification models such as XGBoost and Logistic Regression.

I could improve the train-test-split pipeline. Currently, I’m training on the train data set and predicting on all of the validation/test data set. Instead, I could try training on all transcripts prior to the transcript I’m predicting.

I’m very interested in analyzing the Q&A section of the call. The Q&A section could provide additional insight because the company representatives are off-script and the conversation tends to be more candid as analysts ask pointed questions.

Acknowledgements

Thank you for reading my blog posts. If you have any questions about my Metis experience or my projects, feel free to leave a comment or reach out to me on LinkedIn. Cheers! 🥂

Data Scientist with a Finance Background

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store