“Sentimentum Investing” — Combining Sentiment Analysis and Systematic Trading

Utilising 250,000+ Tweets to backtest “sentimentum” trading strategies

David Woroniuk
The Startup
4 min readOct 1, 2020

--

A Backtest of a “Sentimentum” systematic strategy. Image created by Author.

TL DR: Data, Code, GitHub.

The Financial sector generates a huge volume of data each day, with Google processing over 3.5 Billion searches per day. This data comes in many forms; Factual news, Scheduled Economic releases, Company filings and Investor opinions.

Due to the continual generation of new information and opinions, traders often struggle to stay up to date manually, preferring to automate analysis and use the outputs to generate systematic trading strategies. This article provides a walkthrough of how to augment Simple Moving Average (SMA) trading strategies with Valence Aware Dictionary and sEntiment Reasoner (VADER) sentiment analysis, producing “Sentimentum” trading strategies.

What is Sentiment Analysis?

Sentiment Analysis is a sub-category of Natural Language Processing (NLP), which aims to detect polarity (ie. positive and negative opinions) within a provided text. In essence, Sentiment Analysis measures the attitude, sentiment and emotions presented within a text sample, returning continuous values corresponding to positive, negative, or neutral scores.

What is VADER?

VADER (Valence Aware Dictionary and sEntiment Reasoner) is a sentiment analysis model which is sensitive to both polarity (ie. positive and negative opinions) and intensity (ie. strength of opinions). As such, it can be thought of as similar to ‘one-hot encoding’ text (turning the categorical text variable into continuous values).

Notably, the VADER sentiment analysis toolkit is specifically tuned for social media sentiment, as opposed to more formal documents, accounting for textual phenomena such as ‘!!!’ or ‘CAPS’, which increase sentiment intensity.

Due to the above features, this article utilises VADER to quantify sentiment expressed within 250,000+ tweets containing the terms ‘Tesla’, ‘TESLA’ or ‘TSLA’ over a 1 week period, displaying the role which sentiment analysis can play in generating profitable trading strategies.

Implementation

Now we have an understanding of the underlying model, we can begin to backtest some trading strategies.

The first step is to install the libraries, packages and modules which we shall use:

Secondly, we need to obtain the tweets to analyse. For brevity, this article provides pre-cleaned twitter data, which can be found here. The below code imports this data and visualises the first 10 rows of the DataFrame.

Following this, we need to obtain the corresponding ‘TSLA’ market data. This walkthrough uses Alpha Vantage’s great free API to obtain market data on a minute frequency. The below code accesses the API, manipulates the data and displays the first 5 rows of market_data .

After this, we can combine the financial data held within market_data with the twitter data held within dataset , and re-label the columns.

Next, we remove any additional regular expression (regex) characters which represent line feed characters (\n), carriage return characters (\r ), tabs (\t) or no break space (\xa0) in order to clean the data.

Following this, we define and apply some functions to the tweets, determining the twitter handles of retweets and mentions, and any hashtags mentioned. This data could be further analysed through Named Entity Recognition (NER) techniques or application of a ‘Bag of Words’ focussing on specific search terms.

Now that the dataset has been combined and cleaned, we can apply the VADER sentiment analysis. We initialise empty lists, then iterate through each row of the dataset, appending compound sentiment scores, positive, neutral and negative sentiment scores to the corresponding lists. This information is subsequently added to the DataFrame.

Backtesting

The code required for the long only backtest is outlined below. All trading strategies are provided with $10,000 of capital, allowed to purchase or sell one stock per minute, and constrained from short-selling. Additionally, each strategy is provided with a figure, visualising the time-stamp at which purchase and sale decisions were made.

Systematic Trading Strategies

Strategy 1: A Simple Moving Average (SMA) strategy. This strategy utilises the 21min MA and 50min MA, buying when the 21min MA is larger than the 50min MA, and selling if the opposite is true. However, this strategy is constrained as ‘long only’ in the backtest below.

Results:

Consolidated Position: $9885.127, Realised Gains: $-114.88, Inventory: 0

Strategy 2: An SMA strategy applied to total sentiment. This strategy utilises the 21min MA and 50min MA of total_sentiment, buying when the 21min MA is larger than the 50min MA, and selling if the opposite is true. Whilst this is a long only strategy in a down market, it outperforms buy and hold strategies assuming sale of inventory at the end of the sample.

Results:

Consolidated Position:$10379.12, Realised Gains:$-9660.08, Inventory: 24

Strategy 3: An SMA strategy applied to positive and negative sentiment, as determined by VADER. This strategy utilises the 21min MA and 50min MA of total_sentiment, positive sentiment and negative sentiment, buying when the 21min MAs of total_sentiment and positive are is larger than the 50min MA and the 21min MA of negative is lower than the 50min MA, and selling if the opposite is true. Whilst this strategy underperforms Strategy 2, it incurs far fewer transaction costs, owing to the increased number of parameters.

Results:

Consolidated Position: $9772.55, Realised Gains:$-4828.74, Inventory: 11

Strategy 4: A combination of Strategies 1 & 3. We utilise the SMAs of price to generate a momentum strategy, combined with the sentiment based momentum strategy outlined in Strategy 3, to produce a “Sentimentum” strategy.

Results:

Consolidated Position: $9949.29, Realised Gains:$-2560.53, Inventory: 6

This walkthrough outlines a few potential systematic trading strategies, however it doesn’t account for short-selling or volatility considerations, which represent interesting areas of further in-sample optimisation. Further to this, additional parameters may reduce the impact of transaction costs on strategy profitability.

Feel free to access the data and code within the GitHub repo and develop your own strategies. Happy Coding!

--

--

David Woroniuk
The Startup

Current PhD Student in Economics, interests in Machine Learning, Deep Learning and Systematic Trading.