FDA drug approvals, legal verdicts, mergers, share buybacks, and the occasional CEO podcast appearance, are all examples of events that impact stock prices. Though not as quantifiable as technical indicators, real life events clearly affect prices.
In an attempt to further explore the relationship between events and stock prices, I gathered historical price data from the IEX API and scraped events data from popular financial news sites. This post will go through the process of gathering and cleaning this data followed by an exploratory analysis examining price trends and the impact of events on prices.
The next two sections on getting and cleaning the data are fairly long and slightly technical. Those interested in just the exploratory analysis can jump to that section further down.
Getting The Data
As with most data analysis projects, obtaining and cleaning the data tends to be the most time intensive, especially if it does not already exist in a machine readable format. This was the case for much of the data in this project so I wrote several functions to scrape or gather it. The functions are fairly long so only a brief overview of their returned objects is described below. More details, usage examples, and CSVs of the compiled data exist on github.
The function for scraping events uses Beautifulsoup and scrapes event descriptions from Benzinga, specifically their “Movers” series which highlights recent stock movements and potential catalysts. The format, title, and publication time of these articles have varied in the last several years so the final scraper function is a combination of several scripts accounting for these variations. The function can reliably pull events dating back to October 2015. The scraped events in a dataframe look like this:
The function for obtaining historical price relies heavily on the python module, pandas-datareader, which allows users to pull financial data into pandas dataframes from various financial sources. I decided to go with The Investors Exchange (IEX) since the returned dataframes were well formatted and easy to work with. Taking in a list of tickers and a date from which to start pulling price from, the function returns a dataframe of daily closing prices for each ticker in a specified date range. An example for Apple is below:
Earnings Filing Dates
The reason for also gathering earnings dates will be explained further in the exploratory analysis section below. Using Beautifulsoup, I wrote a function to scrape Marketwatch to obtain the dates on which 10K and 10Q reports were filed. Provided with a list of tickers and how far back to look, the function returns a dataframe of filing dates for each stock. Another example using Apple looks like this:
As mentioned before, the section above provides a very high level overview of the functions used in gathering this data. More details are on github.
Cleaning And Combining The Data
Joining Prices to Event Descriptions
Since we are interested in the relationship between events and price movements, an obvious first step would be to join the price and events dataframes to get the closing price on the day of the event and the closing price one day prior. Getting the closing price on the date of the event is straightforward as we just need to merge on ticker and date.
To get the closing price one day prior, we will use a variation of the pandas merge function, merge_asof, which joins on a nearest key as opposed to an exact key. In our case, we want to join closing price on the nearest date one day before an event date:
The arguments used are explained below:
- The first two arguments specify dataframes to join (dates sorted in both)
- The third specifies which column to match on before merging (tickers)
- The fourth and fifth specify which columns the nearest join operation should occur on (dates)
- The sixth argument specifies which direction to look in for the nearest join (backwards since we want the price one day prior)
- The last argument specifies whether we want exact matches (no, since we do not want to match on the current day but the prior day)
We now have a dataframe of events joined with closing prices for the days on and one day before events. Lastly, we can calculate the percentage price change by using the price on the day of the event and the previous day’s price.
Calculating Moving Averages
In the exploratory analysis section further below, I do some analysis with moving averages. The next bit of data prep will show how to calculate these.
The averages we will calculate are the 50 and 200 day moving averages, though the procedure outlined below works for any range of days:
We first sort the price dataframe by date in ascending order. Next we use the pandas groupby function to group tickers since we want to calculate moving averages separately for each stock. Finally, we use the pandas rolling function to perform a rolling calculation, in this case a mean, over specified windows on the dataframe. Below is an example with Apple showing 5 and 10 DMA:
To merge the moving averages to the events dataframe, all we have to do is join on ticker and date to get moving averages on each event date.
Remove Events With Less Than 4 Weeks of Trailing Price Data
In the exploratory analysis section later, one piece of analysis involves looking at price behavior in the 19 days (20 days from starting price/4 weeks) after an event or large price movement. Thus, we need to be able to remove rows in our events dataframe with less than 19 days of trailing price data.
To do this, we find the max date for each stock on which there is price data and check if the event is less than 19 business days from the max date:
We import BDay from pandas, which helps us offset by business days and use pivot_table to group tickers in the prices dataframe to find the max dates for each ticker. We previously used the groupby function to group, but this is another way to do it. We then merge these max dates to the events dataframe and remove events that are within 19 business days from the max date.
Remove Events Within 4 Weeks of Next Earnings Release
In addition to removing events with less than 4 weeks of trailing price data, for some of the analyses below I also chose to remove events that occurred within 4 weeks of companies’ next earnings releases since earnings can cause large price swings. To accomplish this, we will again use merge_asof, this time to find the nearest upcoming earnings release date after an event.
We set up the merge_asof similar to how we got the previous day’s price, except now the direction argument looks forward since we want the date of the next earnings release. Because Marketwatch does not have earnings records for all stocks, we fill missing filing dates with an arbitrary old date. Lastly we calculate the difference in business days between the event dates and the next filing date, removing rows that have less than a 19 day difference. The fact that we are essentially assuming that stocks missing from Marketwatch do not have events falling within 4 weeks of their next earnings is not ideal, but it is an assumption we will have to work with for now.
In summary, our combined and cleaned dataframe now contains the following fields: event date, ticker, event descriptions, price, price one day prior, percentage price change, and moving averages for price. Additionally, we also showed how to remove events with less than 4 weeks of trailing price data or are within 4 weeks of the next earnings report.
Finally, The Fun Stuff: Exploratory Analysis
Our dataset of events and prices span the time range of October 12, 2015 — May 2, 2019 with 28,089 unique events. Let’s explore them a little bit:
Average Behavior By Initial Price Movement
Using the cleaned events dataframe and historical prices, we can look at price behavior for different levels of initial price movements. The graphs below show the average price behavior in the 20 days after an event grouped by magnitude of day 0–1 price change:
After the initial pop or drop in prices, most groups tend to maintain their new price level for the following 20 days. The exception is the > 30% price increase group, which saw average price increase slide downwards over the next 20 days. A possible explanation is that stocks in this group with extreme initial price increases, for example those with > 50% increases, are more likely to revert some of their initial gains in the following days. The chart below showing average price behavior for stocks with 30–50% and >50% initial price movements, provides some evidence for this theory:
The price behavior after day 1 for the 30–50% group is relatively flat whereas the >50% group exhibits the declining behavior we saw in the previous chart.
A common technical indicator used by traders is the golden cross. Golden crosses occur when a shorter term moving average crosses a longer term moving average, signaling a potential sustained upwards price movement. Using the moving averages we previously calculated, we can explore if the golden cross is an indicator for sustained price growth after a price moving event. If it is, we should expect to see stocks that successfully enter a golden cross continue to rise or sustain their new price level while those that do not decline in price. A popular moving average combination for golden crosses is the 50 day crossing over the 200 day moving average. Grouped by initial price change, the charts below show average price behavior in the 20 days after an event for stocks with successful and unsuccessful 50–200 day golden crosses:
As seen above, the expected golden cross behavior is somewhat true. For the majority of price movement groups, specifically every group except for the ≤5% group, stocks that successfully entered golden crosses tended to perform better on average over the 20 days. Even in the >30% group, in which both successful and unsuccessful golden crosses saw price declines, the successful golden cross category saw a lesser average decline by day 20.
However, stocks that did not enter golden crosses did not always revert their initial gains. Stocks with unsuccessful golden crosses in the 15–20% and 20–30% groups, despite underperforming those that successfully entered golden crosses, were able to maintain their new price levels after the initial increase.
It appears a golden cross provides some indication for whether a stock will sustain upward momentum after a price increase. Further examination of other golden crosses (e.g. 5–20 day) may yield different results. Additionally, more observations might smooth out curves and result in more expected golden cross behavior.
Recall that the events dataframe contains descriptions of the events that caused price movements. By parsing the contents of these descriptions using regular expressions, we can group events into categories. For example, the following regular expressions are used to categorize events related to legal, blockchain, and share buyback/dividend announcements:
In total, I created 12 event categories that either had a good number of observations or I thought were interesting.
- Executive changes: changes in senior executives (CEO, COO, etc.)
- Drug approval/trial announcements
- Legal: any legal verdict, lawsuit, or other legal event
- Amazon’s impact on other companies. Ex: Wholefoods acquisition on grocery stores
- Analyst rating change
- Merger or acquisition announcement
- Buyback/dividend announcement
- Deals/agreements: signing of deals, partnerships, contracts, etc.
- Investment stake: companies investing in other companies
- Stock offerings excluding IPOs
The graphs below show average price behavior for both upwards and downwards movements for each category in the 20 days after an event:
Since most of the categorized events will likely have material impacts on the companies effected, it is not surprising to see that stocks in most categories maintained their new higher or lower price levels in the ensuing 20 days. For example, an unsuccessful clinical trial will likely have negative financial implications for a company due to wasted resources and loss of future profits. As a result, we should expect to see that company’s stock not only go down but stay down. That expected behavior is observed in the Drug Approval/Trial category chart above, as well as in the majority of other event categories.
The three event categories that deviate slightly from this are crypto, legal, and Amazon’s effect on other companies. For crypto, prices tend to slide regardless if the crypto related event drove prices up or down. For legal, it appears that while the effects of positive events lingers, the downward pressure of negative events abates over time as stock prices recover somewhat. Lastly, stocks in the Amazon category tend to revert or recover some of their initial gains or losses. These three categories have relatively small sample sizes, so the trends seen here may be driven by noise.
Also of note is that certain events have larger impacts on price than others. For example, events related to drug approvals/trials move prices roughly +/-25% on average while analyst rating changes result in more moderate +/- 8% moves. From an investor’s perspective, knowing these ranges could be useful in certain trading strategies, such as options straddles or strangles, when the timing of events such as FDA approvals, earnings releases, and some legal verdicts is known or can be reasonably inferred.
Crypto Rally and Bust
2017 was the year of crypto. With cryptocurrency prices rising exponentially, companies sought ways to capitalize on the mania. Some companies started blockchain ventures such as Overstock.com which invested in a securities trading platform built on blockchain technology. Others, like Long Island Iced Tea Corp., took a more heavy handed approach by simply renaming itself Long Blockchain Corp.
As shown in the graph below, the frequency of price moving events related to cryptocurrencies or blockchain closely followed the price of bitcoin:
The height of the bitcoin bubble in Q4 2017 and Q1 2018 is when the most crypto related price moving events occurred. The rise and fall of bitcoin is also mirrored by the rise and fall of crypto related events in this time period. Based on these observations, we should expect to see an uptick in crypto driven price movements during the next crypto rally especially as more companies begin incorporating blockchain technology into their businesses (even if it’s just changing a name).
Another facet to explore in our events data is looking at how the magnitude of price movements has changed over time. In other words, how have price movements in our events data fluctuated during the period of observation. Have price swings become more or less extreme? How do they compare to conventional measures of market volatility?
The following graph shows the average absolute percentage price change of stocks in our events data on the left axis. The right axis shows the average value of VIX, an index measuring volatility of the U.S. stock market.
Though not an exact match, the average price change of stocks in our scraped events dataset closely follows the movement and shape of VIX. As VIX dropped from Q4 2015 — Q3 2017, so did the average change in price. And as VIX picked back up and produced somewhat of an “M” shape from Q4 2017 onwards, so too did the average change in price.
Though close, the shapes do not match for several reasons. The first being that the scraped data is dependent on which companies Benzinga chooses to highlight in its “Movers” series, whereas VIX is based on a more fixed set of stocks, the S&P500. The second is that VIX’s universe of stocks is comprised of relatively large companies whereas Benzinga covers companies of varying sizes, including small cap stocks which might be more volatile. Lastly, definitions of volatility used in the graph above aren’t the same; in the events data volatility is roughy defined as absolute average price change whereas VIX calculates volatility using quotes of put and call options on the S&P500.
Despite these differences, it appears that the population of stocks in our scraped events data behaves similarly to the wider stock market from a volatility perspective.
This post provided an overview on how to pull, clean, and perform some analysis on a relatively messy and disparate dataset. Without a doubt, there are additional insights that can be gleaned from this data. Sentiment analysis on event descriptions, other technical indicators, and the impact of market cap are a few that come to mind. It would also be interesting to see how the observed trends hold up as additional data is added. Finally, as mentioned before, the data and code used for this post can be found on github for those interested in doing their own analysis.