How Valuable Are Your Sentiments?

Zeng Hou Lim
Nov 22, 2017 · 9 min read
Photo: Apple

This article is a product of a final project from a course NM3239 (Retrieving, Exploring, and Analysing Data) in the National University of Singapore. Since this is a Communications and New Media class, we hope our readers extend some grace when it comes to the data science process. All analyses and plots were generated in R Studio. You can check out my Github repo here.

The Power of Apple

On 12th September 2017, the Apple Event caused the share price of Energous, a wireless charging company, to drop as much as 21%. This decline in share price was induced by Apple’s latest announcement that they were using the Qi Wireless standard instead. Conversely, it has been observed that Apple’s manufacturer partners have seen an increase in their stock price following the announcement of their partnership during the event. It is by no coincidence that Apple has such great influence over the share prices of its partner companies. After all, Apple is a prominent consumer brand, and the Apple event is a highly-anticipated spectacle among industry experts, public, and possibly the financial sector too.

Given Apple’s influence over the share prices of its partner companies, does Apple have any influence over its own stock prices (AAPL) during the Apple event? And, if there is any influence at all, how?

What Influences what?

Before we begin our analysis, we need to know which factors to examine to establish a relationship. We are definitely interested in Apple’s stock price during the event. As mentioned earlier, the Apple event is a closely followed event by multiple stakeholders. The event was live-broadcasted throughout the world, making it accessible to any individual with an Internet connection. Technology magazines such as The Verge and others have dedicated team live tweeting throughout the event — and they’re not the only ones. It has become an observation that people would live-tweet their opinions and sentiments during the Apple event. As such, the second factor of interest would be the tweets posted during the event. More specifically, we are interested in the sentiment scores of the tweets as a reflection of the audience’s reaction to the Event.

According to several studies, Twitter has been discovered to have an influence on stock prices. In 2010, a study conducted by Indiana University found that general “mode states” of Twitter users, such happiness and calmness, were correlated with whether the Dow Jones Industrial Average finished the day up or down. Therefore, there is great value in exploring the relationship between sentiment scores and stock price during the Apple event.

Data Retrieval and Manipulation

To analyze the sentiments of tweets and their relation to AAPL, we first needed to retrieve the data. We retrieved 120 open stock prices manually from Yahoo Finance for the two-hour event. For the Apple Event, we sat through the entire event and noted down the timings for individual segments, and included any notable moments that might be helpful for our analysis later.

For our Tweet retrieval process, there is one good news and one bad news. Bad news: Twitter’s Application Programming Interface (API) only allowed for selective retrieval of tweets that are no older than seven days. We came up with this inquiry on the eighth day, which meant the earliest tweet we could retrieve was one day after the Apple event. Not so helpful. Good news: developers like Jefferson Henrique understood the limitations of time constraints, and have developed an application to retrieve, according to him, even the deepest oldest tweets. Instead of using the API, Henrique mimicked the scrolling of a web browser, which allowed for the retrieval of tweets through a JSON provider as though users were scrolling through a Twitter page. Therefore, with the help of Henrique’s program, we were able to retrieve approximately 120,000 tweets with the hashtag #AppleEvent for the entire duration of the Apple event on 12th September.

Before the raw data can be used, it has to be processed. Using R studio, we removed hashtags, URLs, and encodings since they cannot be used in the computation of a sentiment score. The package sentimentR was used to calculate the sentiment score on a sentential level. sentimentR was designed to quickly calculate text polarity sentiment at the sentence level by considering valence shifters (negators, amplifiers (intensifiers), de-amplifiers (downtoners), and adversative conjunctions). According to the developer of sentimentR, this nuanced approach gives a more accurate result than a simple lookup dictionary.

After getting the sentiment score of each tweet, an average sentiment score by minute was calculated by summing up the scores within the respective minute and dividing it according to the number of tweets. Getting an average sentiment score helps to negate the bias brought about by the differences in the absolute number of tweets per minute.

Figure 1.0: Formula for calculating sentiment score
Figure 2.0: Sentiment Scores throughout the AppleEvent (by segments)

In Figure 2.0, we can see that the Apple Event varies in audience sentiment by segment. Apple Watch and Apple TV fared well with the viewers, but iPhone X did not appeal to the audience at all. With this observation in mind, we proceeded to look at stock prices to investigate if the stock prices reflect similar pattern as the sentiment score.

Let the data speak

We chose to look at the rate of return to analyse stock prices. This allows us to conduct time series analysis, which is the standard (1, 2) for investigating the relationship between Twitter sentiment and stock prices.

The formula used is shown in Figure 2.0 below.

Figure 3.0: Formula for calculating rate of return

We chose to look at the rate of return because it reflects the magnitude and direction of change in the price of AAPL’s stock from one minute to the next. After plotting the rate of return against the average sentiment scores, we observe a coherent pattern in Figure 4.0.

Figure 4.0: Average Sentiment Score against AAPL’s Rate of Return with no time lag

Although the peaks and troughs of the respective line graphs correspond with each other in Figure 4.0, there seems to be a slight lag between them. Out of curiosity, we decided to calculate the r-squared values for up to three minutes lag in the rate of return.

FIgure 5.0: R² values for Sentiment Score against AAPL Rate of Return at different time lags

From Figure 5.0, when we lag the rate of return by 2 minutes, we get a statistically significant result (p < 0.05) that explains the highest amount of variability in the share price (R² = 0.128). We have built an interactive chart that you can explore with here, since Medium does not allow the embedding of Shiny.io web applications.

What are 3 main insights that readers might takeaway?

  1. Reactions to sentiment changes are not immediate

An explanation for the time lag could be attributed to investors’ decision making process after getting the sentiment data. Time is needed for human interpretation of these data since they could be skewed or misrepresented. For example, the proliferation of Twitter bots may be used to generate overwhelmingly positive tweets in a bid to boost stock price. Therefore, investors are likely to make decisions after human discernment, which will account for the time lag.

2. The time lag might be shorter than you think

Figure 6.0 Breakdown of the two minute lag from Twitter sentiment to Rate of Return

As posited above, there appears to be a full two-minute lag between sentiment score and rate of return in our post-hoc analysis. However, considering that we are using open price for our analysis, the actual lag is shorter than two minutes. When calculating the sentiment score, one entire minute is used for collection and calculation — as seen in Figure 6.0. In essence, the sentiment score at t-zero is derived after sixty seconds, at t-one. Therefore, there is in fact only one minute of time lag after the rate of return catches up at t-two.

3. When it comes to the stock market, the glass is half-empty

Seven out of the ten highest sentiment scores come from the Apple TV and Apple Watch segments. On the other hand, the ten lowest sentiment scores come from the iPhone X segment. However, when we calculated the R² values between rate of return and the sentiment scores for the respective segments, we found no significant R² values during the Apple TV and Apple Watch segments, but a higher than average R² value (0.13) during the iPhone X segment. In other words, the variations of rate of returns can be better explained during a negative segment than a positive one.

This disparity can be attributed to negativity bias, loosely described as:

“When of equal intensity, things of a more negative nature have a greater effect on one’s psychological state and processes than neutral or positive things.”

Lensing through the concept of negativity bias, investors are more sensitive to fumbles than positive events, and their investing behaviour reflects this. In other words, given the same magnitude of positive and negative sentiments, investors react more adversely to bad news than positively to good news.

Limitations in our analysis

Conducting real-time analysis when we are new to R meant we are restricted by our technical knowledge. This results in several project limitations affecting the accuracy of our work.

  • Opaque tweet collection procedure — While we certainly got a significant number of tweets using Henrique’s program, we do not know how many tweets were not included in our data. Hence we were unable to estimate if our sample is a representative one.
  • Discard Tweets not in English and consisting media links — Due to the limitations of language-specific sentiment analysis, we lost valuable information as our sample is Western centric. Furthermore, the exclusion of media (likely reaction photos or memes) also means the loss of sentiment data.
  • Did not take Retweets into account — In our analysis, we did not account for the number of retweets each tweet received. The lack of this consideration would skew our data since we treated retweeted tweets singularly, than separate tweets as they ought to be.
  • Bot accounts — Given the rise in Twitter bots, there is a huge possibility that some tweets were artificially created by these bots. As we cannot determine how many tweets originated from bot accounts, the accuracy of our analysis may be undermined.
  • Granular scale — Conducting sentiment analysis on a minute scale leaves very little room for errors and flexibility. For example, people could still be tweeting about Apple TV when Cook has moved on to talk about Apple Watch. This phenomenon can be observed from the word cloud of top words generated for each segment of the Apple event (once again, link to interactive app here). Hence some tweets might get miscategorised, adversely affecting the accuracy of our project.

Conclusion

In conclusion, using Twitter sentiment as a reflection of the Apple Event allowed us to gain an insight into how the audience reacted to the event. By lagging the rate of return by two minutes, we were able to obtain a R² value of 0.12. While this value might seem insignificant, being able to account for 12% of the change in price is valuable information that could help firms and individuals refine their trading strategies. This knowledge would translate into significant profits, if used judiciously in conjunction with other sources of informations (e.g. technical and fundamental analysis).

However, it is important to note that our current analysis is just in its preliminary stages, and (disclaimer!!) should not be used independently for any stock analysis. More in-depth knowledge about real time analysis between Twitter sentiment and stock price should be studied.

Suggestions for Future Works

Now that we have established a preliminary relationship between Twitter sentiment and rate of return, there are several paths to take in order to gain a deeper understanding of the relationship.

Future projects may consider identifying opinion leaders in specific fields and adjusting the weightage of sentiment scores accordingly. This action would help reduce the contributions of tweets generated by bot accounts. Furthermore, while we are all born equal, the words of some have greater influence over the general public.

Additionally, our current analysis is relatively granular (i.e. on a one-minute scale). Future projects may look into analysing different time scales (e.g. two-minute frames, five-minute frames) in order to determine if the relationship applies when a longer timeframe is employed.

Contributors

Lim Zeng Hou

Chiang Hai Xuan

Mao Yulin

Special thanks to Jude Yew (Instructor) and Dennis Ang (Teaching Assistant) for their help throughout this process.

Thanks to HAIXUAN

Zeng Hou Lim

Written by

Software Engineer at LeanData. Excited about living my best life and becoming a better engineer. I like taking complex ideas and breaking them down.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade