Tweets: You can’t hit unsend (sentiment analysis and web scraping)

Apurva Misra
May 23 · 4 min read

“Tesla shares tank after Elon Musk tweets the stock price is ‘too high’ ”, was one of a recent headline even after the previous court order requiring him to get a company lawyer’s approval before issuing any written communications regarding Tesla’s finances. In this article we look into scraping Elon Musk’s tweets and Tesla’s stock prices from Yahoo Finance followed by sentiment analysis and analyzing a relationship with the variation in Tesla’s stock price.

All the code and *.csv files can be found in my GitHub repository-https://github.com/ApurvaMisra/tweet_analysis.git and only snippets of the code are provided here.

Extracting tweets and sentiment analysis

GetOldTweets3 library was used to get his tweets from when he really started tweeting. Two advantages of using this library are-

  1. No requirement to create an app with twitter
  2. No limit on the number of tweets extracted for an individual
All the tweets of Elon Musk are extracted until the date when the code was run

The tweets extracted can be saved into a *.csv file using “.to_csv” method in pandas.

For sentiment analysis, VADER(valence aware dictionary for sentiment reasoning), a rule-based library was utilized. It has a lexicon with scores attributed to each token. At the time of the writing it included 7500 words and their corresponding scores. It basically splits the sentence into words and finds the scores for each one of them to get the compound score. It works well for shorter sentences like tweets but as the sentence length grows sequence of words has a huge impact.

VADER used to get sentiment scores for tweets

Extracting stock price

Beautiful soup along with selenium was utilized to get the Tesla share prices.

If we visit the Yahoo Finance website and look for ‘TSLA’, TESLA stock price would come up, since we require the prices from the year 2010 as that is when Elon Musk made his first tweet on the platform. We go to the ‘Historical Data’ tab and change the required time period. The web page consists of a table and when we right-click and select “inspect” it shows the HTML code, while scrolling through the code we will be able to find the “table” tag and the corresponding class as given below.

Inspect elements

“.find” method is used to get the structure from within the table tag and similarly a structure of “tr” and “td” tags can be parsed to get the contents belonging to each row. The data from each row is stored in a list and written into a *.csv file.

Extracting stock price table from url

Relationship between stock prices and sentiment

When Elon Musk posts a tweet, what we want to look for is the change in the close price from that day compared to the previous day. The plot for close stock price vs year along with the sentiment value of the tweets is given below. The stock price was scaled between [0,1] to have a comparison with the sentiment value which lies between [-1,1].

Scaled “Close” stock value and sentiments vs year.

There is no distinguishable pattern that can be observed from the above figure but we can see the drop in stock price. Since, we just want to see if there is a correlation between the drop/increase in stock prices with the sentiments of his tweets we will plot the difference between close price from that day to the previous day.

Change in stock price vs year.

There are some apparent places where drop in sentiment and stock price overlap. We will concentrate on the effect of only negative tweets on stock price which gives Pearson correlation coefficient of -0.22. Implying that a negative sentiment leads to an increase in stock price which goes against our hypothesis. This could be attributed to the fact that the sentiment values didn’t correspond well with the actual sentiment he tried to convey. For example the tweet “Tesla stock price is too high imo” was given a neutral sentiment value of 0 even though it should have been given a negative sentiment if the two-gram “too high” was taken into account and also tokens like “imo”, “lmao”, “lol” should be assigned a sentiment value which wasn’t the case with VADER.

Code for finding the Pearson correlation coefficient

In future work a more advanced sentiment analysis technique will be used to find the relationship.

Apurva Misra

Written by

Masters at University of Waterloo| Data Science | ML| Statistics

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Apurva Misra

Written by

Masters at University of Waterloo| Data Science | ML| Statistics

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store