Behavioral Finance

What’s UP? Microsoft, or Amazon?

Nishant Sahoo
4 min readJul 21, 2018
Picture courtesy:

In this article, I’ll talk about how Harshini and I tried to predict the stock market behavior by performing Sentiment Analysis on Twitter tweets about NASDAQ 100 companies. We used TextBlob (a Python library) to perform sentiment analysis, sklearn to build the classifier model, and collected the Twitter tweets data set from followthehashtag official website.

Many stock market investors try to predict the market behavior by analyzing previous trends, reading news articles, and understanding how well certain products of a company are performing in terms of revenue. While there are many other factors that influence market behavior, this article will focus on how the study of behavioral finance can help us gain insights into the stock market. Here are two famous incidents which show why the study of behavioral finance is important and how Twitter is a powerful platform that deeply influences the market -

  1. Snapchat shares dropped about 7% on 22nd Feb 2018 after Kylie Jenner tweeted the following, seen by her 25 million followers. Here’s an article which talks about the same.

2. Tesla shares rose 2.4% - which is an exceptional increase in the market — after Elon Musk replied to The Economist’s tweet

with the following tweet.

These observations inspired me to work on a model which tries to predict the stock market behavior by performing sentiment analysis on Twitter tweets about a particular company.

Sentiment analysis

4 basic steps to perform sentiment analysis -
1. Data collection: We collected data for this model from this data set.
2. Text preparation/Text Preprocessing: We performed text preprocessing using the following code -

import reemoticons_str = r"""
[:=;] # Eyes
[oO\-]? # Nose (optional)
[D\)\]\(\]/\\OpP] # Mouth

regex_str = [
r'<[^>]+>', # HTML tags
r'(?:@[\w_]+)', # @-mentions
r"(?:\#+[\w_]+[\w\'_\-]*[\w_]+)", # hash-tags
r'http[s]?://(?:[a-z]|[0-9]|[$-_@.&amp;+]|[!*\(\),]|(?:%[0-9a-f][0-9a-f]))+', # URLs

r'(?:(?:\d+,?)+(?:\.?\d+)?)', # numbers
r"(?:[a-z][a-z'\-_]+[a-z])", # words with - and '
r'(?:[\w_]+)', # other words
r'(?:\S)' # anything else

tokens_re = re.compile(r'('+'|'.join(regex_str)+')', re.VERBOSE | re.IGNORECASE)
emoticon_re = re.compile(r'^'+emoticons_str+'$', re.VERBOSE | re.IGNORECASE)
def tokenize(s):
return tokens_re.findall(s)
def preprocess(s, lowercase=False):
tokens = tokenize(s)
if lowercase:
tokens = [token if else token.lower() for token in tokens]
return tokens

By using the above code, one can remove emoticons, HTML tags, mentions of other Twitter accounts using “@”, hashtags, URLs, and numbers, leaving only a sequence of simple English tokens/words. This text can now be analyzed to calculate the sentiment score.

3. Sentiment calculation: Using TextBlob, we can find the sentiment score of a given text by using the following code -

from textblob import TextBlob

text_str = '''
Alexa is an amazing product by Amazon. Amazon is the best. Now I don't have to answer those stupid questions my kids ask me.

blob = TextBlob(text_str)
sentiment_score =
# prints the sentiment score for the text above

In our project, we’ve calculated the sentiment score for all tweets about a particular company of a given day. This sentiment score is further used for sentiment classification.

4. Sentiment classification: To predict whether the stock will increase or decrease, we’ve used a Decision Tree Classifier. The end goal of our model is to classify whether the stock price for a given company on a given day will rise or fall. To do so, first, we calculate the public sentiment value associated with a company for a given day as shown in step 3. Then, we calculate the change in stock price for a given day, by subtracting the previous day’s closing stock price from the current day’s closing stock price. These two values, calculated for a given day, are used as training data inputs for our classification model.
A) Training: We train the classification model by using the following input values -

[day_1 = {
'sentiment': sentiment_1,
'stock_change': change_value_1
day_2 = {
'sentiment': sentiment_2,
'stock_change': change_value_2
day_3 = {
'sentiment': sentiment_3,
'stock_change': change_value_3

If the change in stock price > 0 (increase): stock_change is set to 1, and if the change in stock price ≤ 0 (decrease or no change): stock_change is set to 0.

B) Prediction: The sentiment score of all tweets about a company for a given day is used as input to the model, which is used to predict whether the stock price will increase or decrease. If the classification result is -
i) 1: Increase in stock price
ii) 0: Decrease in stock price

The following code can be used to predict the stock market behavior for day_x -

clf = tree.DecisionTreeClassifier()
clf =, Y_StockVal)

The source code for the complete project can be found in this repository -

I hope you found this article useful. :3
Give me a clap, or two, or forty (👏) if you want to read more such stuff from me.