AI vs. Momentum: Predicting Stock Prices Using Social Media Sentiment

accentedge
Artificial Intelligence vs Momentum
9 min readNov 19, 2020

--

As the corporate landscape evolves and social media continues to impact the business world, at accentedge we decided to do a study using Artificial Intelligence (AI) to see if we could understand the relationship between a company’s stock price and the public’s perception of a company as defined by the social media “sentiment” of that company on Twitter.

To see if we could find a relationship between stock prices compared to sentiment, we used AI technology to analyze the rate of change in a company’s stock price based on sentiment using an analysis of Tweets mentioning the company. By correlating the company stock price and sentiment on Twitter, we were able to see patterns that indicate social media sentiment can be used as a tool to evaluate the public’s perception of a brand. You can watch the full video at this link “Predicting Stock Prices Using Social Media Sentiment”.

Existing Research

Existing Research

To determine if social media sentiment can be related to stock price trends we started by looking at earlier studies that indicate there is a correlation between Twitter sentiments and stock price trends.

The research shows that Social Media trends are perhaps the most important repository of public sentiment. A strong correlation exists between the rise or fall in stock prices of a company to the public opinion or emotions about that company expressed on Twitter through Tweets.

There is also evidence of causation between public sentiment and stock market moments, in terms of the relationship between mood (based on the average daily mood on Twitter) and the closing price. In addition, we see that the sentiment polarity of Twitter peaks implies the direction of cumulative abnormal returns.

Our Model

To see if we could find a connection, our model used the Twitter sentiment of a company and correlated that data with stock price returns.

To study the stock price return we used the price rate of change; a company’s stock closing price one day minus the stock closing price the previous day divided by the stock closing price the previous day. The price rate of change is the proportion by which a company’s stock price increases or decreases throughout the day.

Applying Machine Learning

Our Machine Learning models used five input features, or data points, and correlated them to predict the target variables (stock price rate of change).

Our input features used the following input data points on a given day:

  • The number of positive Tweets issued today
  • The number of negative Tweets issued today
  • The number of positive Tweets issued yesterday
  • The number of negative Tweets issued yesterday
  • Today’s stock price rate of change
Machine Learning Model

With Machine Learning we can take these input features and train the model. The input features can use millions of data points to create a model that learns to forecast what the target variable will be.

Control Model

A Control Model is one that predicts today’s stock price Rate of Change (ROC) based on yesterday’s stock price Rate of Change.

We compare our model with the Control Model which is essentially the same process minus the Twitter sentiments. This is important because if we receive good results from our model, we want to make sure that the Twitter sentiment was creating meaningful information. If the Control Model performs better than our model then it means Twitter sentiments do not correlate with stock price rate of change.

Tools Used In Our Model

We used the following programs to run our model:

Twitterscraper is a Python script available free online. It is a solid option for quickly collecting a large number of Tweets. We mined our Tweets based on hashtags and cashtags. Cashtags is introduced by Twitter a few years ago. If a user of Twitter directly wants to talk about the financial situation of a company or the stock they use cashtag ($). This data is important for our specific model because it guarantees that people are talking about stocks directly.

Amazon Sagemaker is Amazon Web Service’s integrated development environment. It hosts server instances on which we can train pre-implement Machine Learning algorithms. We designed our model using the linear learner tools logistic regression algorithm.

Textblob is a natural language tool that is available free online. Textblob is a Python library for processing textual data that provides a simple API for diving into common Natural Language Processing (NLP) tasks. Textblob is used to analyze the polarity of Tweets, whether they are positive or negative in the sentiment.

To analyze our Tweets we created a Textblob object which is a specific type of Python object. We then pass the object to the Textblob and it will output a number. This number ranges from -1 to +1 and that is the polarity of the Tweet. The more negative the number the more Tweet is negative.

IEX Developer Platform is a web-based API supplying quoting and trading data. It allows you to access the stock price of a company in real-time. Since we were more interested in historical stock data we used the IEX finance Python module to access stock closing prices of Apple and Tesla over a two-year period from 2016 to 2018.

Training the Machine Learning Model

For the Machine Learning training model, the data set is used to train an algorithm to understand how to apply concepts such as neural networks, to learn and produce results. It includes both input data and the expected output.

Our model data is organized on trading days. For each trading day, we have one data point. Each of these data points includes five input features that help predict the target price. Assigning the algorithm’s input features and the target variable is the process of training used in this Machine Learning model.

The goal in Machine Learning training is once we input enough of these input features the model learns to be able to predict the target variable. Let’s say we are using the model today we can mine today’s Twitter sentiments, yesterday’s Twitter sentiments, and today’s price Rate of Change (ROC). Once we have a solid and robust model we can input these features and the model will tell us what it thinks tomorrow’s price ROC will be.

For our model, we used historic stock data from 2016 to 2018. For three-quarters of that period, we used the data to train our model and for the final quarter of the period, we tested the model.

Model Results

For the car company Tesla, we ran three models using varying levels of Tweets to see how accurate our model could be. In the chart, blue represents the control predicted Rate of Change (ROC) and red represents the actual ROC based on Tweets.

Tesla 270,000 Tweets Model

The Tesla 27K Tweets Model is extremely inaccurate, it shows there is no correlation with actual values whereas the control model is more accurate. Our model also has a high value mean absolute percentage error. Mean absolute percentage error is the average proportion by which a prediction was different from the actual value. So lower mean absolute percentages are better.

Tesla 560,000 Tweets Model

The Tesla 560K model is highly accurate as we have an increased number of Tweets from the 270k model. We can see how by increasing the amount of data (Tweets) we can achieve more accurate results.

Tesla 1.2M Tweets Model

Using 1.2 million Tweets the model almost intercepted the control model accuracy. That means our model is learning, it is getting better. The more Twitter sentiments we are able to mine the better model can be. This result indicates that Twitter sentiments may be an important factor in helping to determine stock prices.

Apple 1.7M Tweets Model

With this model, we used the same process used with Tesla and input data for Apple Inc. The model using 1.7 million Tweets shows our results are inaccurate.

Apple 2.7M Tweets Model

By increasing the number of Tweets measured to 2.7 million Tweets, our model comes close to intercepting the control model. This result again shows a possible correlation between Twitter sentiments and stock prices.

How Do We Improve Our Model?

While we saw some interesting results with our Machine Learning model, there is more work needed to see if we can generate a predictable correlation between social media sentiment and actual stock prices.

One area to improve is to continue to train our model to improve its ability to do sentiment analysis. One way to do this is by improving the way text is preprocessed, for example, by looking at different spellings, abbreviations, or emoticons that are used and assigning sentiment to those indicators.

Next, we would like to explore other types of Machine Learning algorithms. So far, we don’t know if the relationship can even be modeled linearly.

Another area we would like to try is using a social media model on top of a quantitative trading model to see what kind of results that would produce. If we use our sentiment analysis on top of these models perhaps it will have more grounding on the current stock price trends and the sentiment analysis can add some pertinent information that might improve these models.

Conclusion

In this research, “Predicting Stock Prices Using Social Media Sentiment” we have looked to see if social media sentiment can be used to help us predict the stock price trends of a company. We utilized AI Machine Learning models to understand the relationship between Twitter-based “sentiment” of two particular companies and its correlation with stock prices using a large-scale collection of Tweet data.

We have also investigated how varying levels of Tweets affect the accuracy of the models. Our model was extremely inaccurate on a lower number of Tweets. But once we increased the number of Tweets our model intercepted the control model accuracy. That means the more Twitter sentiments we are able to mine, the better our model can be — suggesting that Twitter sentiments can be an important factor in helping predict stock prices.

Our results show that negative and positive Tweets of the public carry a strong cause-effect relationship with price movements of individual stocks. While sentiment analysis might not be that useful on its own — perhaps the best use case for it would be to help quantitative traders to get an edge, as a tool that their competitors don’t have with only momentum analysis. We know one of the reasons why stock prices follow a random walk with very little accuracy is because stocks are so acutely affected by the news. How can we tap into the news? The best way is through social media sentiment.

We believe that with further development this tool can be used to evaluate the public’s perception of a brand — and thereby better predict a stock price utilizing consumer sentiment.

For further details visit our website https://www.accentedge.com/

About the author: Ali Saeed used his summer internship at accentedge to research “AI vs. Momentum: Predicting Stock Prices Using Social Media Sentiment.” Saeed is from suburban Chicago and is currently a student majoring in Computer Science at Stanford University.

--

--