Using Sentiment Analysis as a Trading Signal? Beat the market with Transformers!

Chedy Smaoui
10 min readJan 21, 2024

--

The emotions of investors drive the financial market. These are influenced by news released by the companies or the press; in the golden days of Wall Street, a trader would typically start his day reading the newest release of the Financial Times and use this to decide whether he should keep (or dump) his newly acquired stocks. Nowadays, machines can, in seconds, process hundreds of companies press releases, annual financial reports, and even comments on social media — to collate a clear idea of public opinion.

What if we used the sentiment of today’s financial news articles (scored between -1 and 1) when deciding whether or not to enter the market? In this article, I will explore how Natural Language Processing can be used as a buy signal. I will do this by backtesting on Python a simple sentiment analysis driven strategy.

Some results obtained at the end of this experiment.

What is Backtesting?

Backtesting is a method of testing your trading strategy on historical market data, to evaluate how your strategy would have performed in the past. Obviously, past performance does not guarantee any future performance. In fact, the stock market is known for being volatile, dynamic, and non-linear. While backtesting a strategy does not predict exactly how it will perform in the future, it can help assess the utility of a strategy.

The Plan…

  1. Getting started: I will go over which Python libraries to install and import for this article’s code.
  2. Fetching the data: Then, I will quickly explain how the data was acquired and the type of data I will be using.
  3. The strategy: Next, I will show you how to write a Backtrader Strategy Class and explain my simple sentiment analysis driven strategy.
  4. Running the backtest: I will show you how to use Backtrader’s Cerebro Engine to evaluate the strategy against historical data.
  5. Strategy evaluation: Finally, I will evaluate the strategy, comparing it against a classic buy and hold, and suggest possible improvements.

Getting Started…

To backtest my sentiment analysis driven strategy, I will be using Backtrader.py. This Python library is great for writing reusable trading strategies, indicators, and analyzers without having to spend time building infrastructure.

import backtrader as bt

I also install and import Pandas, a Python package that provides fast, flexible, and expressive data structures. Finally, the datetime library is also needed, as I will be manipulating timed dataseries.

import pandas as pd
from datetime import datetime

Finally, import quantstats. It is a Python library that performs portfolio profiling, allowing quants and portfolio managers to understand their performance better by providing them with in-depth analytics and risk metrics.

import quantstats

Fetching the Data…

Firstly, in order to backtest a strategy, a good amount of historical data is needed. While gathering accurate historical prices and volumes of a stock is quite easy nowadays, acquiring accurate alternative data (such as daily financial news articles) for free is not as easy. Therefore, I have uploaded a csv file containing the data I will be using in this article on my github page.

Screenshot of the GOOG.csv file containing historical data on Google stock

The daily Open, High, Low, Close, Adjusted Close and Volume were obtained using the yfinance library.

  • Open: the price at which a stock began trading when the market opens
  • High: the maximum price of a stock in a given trading day
  • Low: the lowest price of a stock in a given trading day
  • Close: the price at which a stock ended trading when the market closes
  • Volume: the total amount of shares traded before the market close on a given trading day.

The daily sentiment score was obtained by performing Sentiment Analysis using Financial BERT on daily financial news article headlines discussing Google. I have presented a way to perform financial sentiment analysis in my latest article: How I code a Python Stock Screener & A.I. Sentiment Analysis to pick stocks. | by Chedy Smaoui | Dec, 2023 | Medium.

A sentiment score of 1 indicates that the daily news articles’ headlines are very positive towards the stock, whilst a score of -1 indicates a very negative sentiment. A score of zero means that the headlines are neutral, or that no articles were recovered on that day — missing sentiment score values are replaced by a neutral score of 0.

I have coded a Python class to help obtain historical data with daily sentiment scores on over 6,000 different tickers. I used it to acquire the CSV file containing historical data on Google’s stock:

#data_raw = DataGenerator.get_data('GOOG')
#data_raw.to_csv('GOOG.csv')

data = pd.read_csv('GOOG.csv')

# Dataframe editing
data['Date'] = pd.to_datetime(data['Date'])
data = data.rename(columns={'finbert_sentiment_score': 'sentiment'})
data.set_index('Date', inplace=True)

# Edit the PandasData feeder to accept the last column as holding the sentiment scores
class PandasSent(bt.feeds.PandasData):
lines = (('sentiment'),)
params = (('sentiment',-1),)

# Pass the data into the Backtrader data feeder
data = PandasSent(dataname=data)

Due to the focus of this piece, I won’t go in more detail over the DataGenerator class.

The Strategy…

First, define the strategy class starting by its log function. Think of the log function as the print() function of your strategy class. It allows you to pass textual data via the txt variable and output it to the screen when called.

class SentimentStrat(bt.Strategy):
params = (
('exitbars', 3),
)

def log(self, txt, dt=None):
"""
Logging function for this strategy.

Args:
txt (_type_): text variable to output to the screen.
dt (_type_, optional): Attempt to grab datetime values from the most recent data point if available and log it to the screen. Defaults to None.
"""
dt = dt or self.datas[0].datetime.date(0)
print(f'{dt.isoformat()} {txt}') # Print date and close

Now that the printing function is defined, I override the initialization function. Inside, define the data that the strategy object will be using. Here, I am only interested in the daily close prices and sentiment scores:

    def __init__(self):
# Keep a reference to the "close" line in the data[0] dataseries
self.dataclose = self.datas[0].close
self.datasentiment = self.datas[0].sentiment

# To keep track of pending orders and buy price/commision
self.order = None
self.buyprice = None
self.buycomm = None

The next function to override is the notify_order function. This is where everything related to trade orders gets processed. By using the log function defined earlier, it will notify you whenever a trade order (buying or selling stocks) is executed:

    def notify_order(self, order):
if order.status in [order.Submitted, order.Accepted]:
# Buy/Sell order submitted/accepted to/by broker - Nothing to do
return

# Check if an order has been completed
# Attention: broker could reject order if not enough cash
if order.status in [order.Completed]:
if order.isbuy():
self.log(
'BUY EXECUTED, Price: %.2f, Cost: %.2f, Comm %.2f' %
(order.executed.price,
order.executed.value,
order.executed.comm)
)
self.buyprice = order.executed.price
self.buycomm = order.executed.comm
else: # Sell
self.log(
'SELL EXECUTED, Price: %.2f, Cost: %.2f, Comm %.2f' %
(order.executed.price,
order.executed.value,
order.executed.comm)
)

self.bar_executed = len(self)

elif order.status in [order.Canceled, order.Margin, order.Rejected]:
self.log('Order Canceled/Margin/Rejected')

self.order = None

Similarly, override the notify_trade function to print out information related to each closed trade:

    def notify_trade(self, trade):
if not trade.isclosed:
return

self.log('OPERATION PROFIT, GROSS %.2f, NET %.2f' %
(trade.pnl, trade.pnlcomm))

Finally, I implement all of the strategy’s logic inside the next function. The idea is simple: if today’s sentiment score is very high (above 0.6), then I will buy 1,000 shares of Google and sell all of them three days later. While this strategy could be improved, this article’s purpose is just to get an idea of what can be achieved when using a simple sentiment analysis driven strategy.

    def next(self):
# Simply log the closing price of the series from the reference
self.log(‘Close, %.2f’ % self.dataclose[0])

# Check if an order is pending ... if yes we cannot send a 2nd one
if self.order:
return

# Check if we are in the market
if not self.position:

# If the sentiment score is over 0.6, we buy
if self.datasentiment[0] > 0.6:

self.log('BUY CREATE, %.2f' % self.dataclose[0])

# Keep track of the created order to avoid a 2nd order
self.order = self.buy(size=1000)

else:
# Already in the market, we sell three days (bars) after buying:
if len(self) >= (self.bar_executed + self.params.exitbars):

self.log('SELL CREATE, %.2f' % self.dataclose[0])

# Keep track of the created order to avoid a 2nd order
self.order = self.sell(size=1000)

Running the Backtest…

Now that the strategy class has been established, this can be used inside the main script to evaluate against historical data. Firstly, instantiate the Cerebro class. This is the cornerstone of Backtrader as the Cerebro class serves as a central point for gathering all inputs (Data Feeds), actors (Strategies), and critics (Analyzers). It also enables backtesting or live data feeding and trading, and can be used to return and plot the results.

# Instantiate the Cerebro engine
cerebro = bt.Cerebro()

# Add the strategy to Cerebro
cerebro.addstrategy(SentimentStrat)

# Add the data feed to cerebro
cerebro.adddata(data)

# Add an analyzer to get the return data
cerebro.addanalyzer(bt.analyzers.PyFolio, _name='PyFolio')

# set initial porfolio value at 100,000$
cerebro.broker.setcash(100000.0)
start_value = cerebro.broker.getvalue() # should be 100,000$

# Run the Backtest
results = cerebro.run()

# Print out the final result
print('Starting Portfolio Value: ', start_value)
print('Final Portfolio Value: %.2f' % cerebro.broker.getvalue())

Now, run the program and see how much money could have been made using this strategy on Google stocks from December 2018 to July 2020. In the terminal, you can see the starting and final portfolio values:

Starting and Final Portfolio Values

It made 10,108$ — not bad! Now, use the Cerebro engine plotting function to have a more detailed feedback over the backtest:

# Plot the results using cerebro's plotting facilities
cerebro.plot()
Results from cerebro.plot()

Most of the trades were positive because the algorithm bought shares of GOOG when the reviews on the company, and its stock, were very positive. As expected, a lot of investors also started buying shares of Google. And increasing demand made the price of the stock increase. The algorithm then sold all of its shares 3 days later, making a profit the majority of the time.

Though, on an interesting side note, you can see that using alternative data such as sentiment scores made the algorithm “dodge” the huge price fall caused by the Covid Crisis around the start of 2020. This is probably because the reviews on GOOG were negative (or at least below 0.6) during the Covid crisis. Although, it eventually entered the market after the price fall once critics on GOOG were positive again. This is crucial as, whilst to many the Pandemic was ‘unforeseen,’ by using Sentiment Analysis, the algorithm was able to avoid a major financial loss.

Back to the job at hand, use quantstats to obtain a more detailed feedback:

strat = results[0]

# Uses PyFolio to obtain the returns from the Backtest
portfolio_stats = strat.analyzers.getbyname('PyFolio')
returns, positions, transactions, gross_lev = portfolio_stats.get_pf_items()
returns.index = returns.index.tz_convert(None)

import webbrowser
# Feeds the returns to Quantstats to obtain a more complete report over the Backtest
quantstats.reports.html(returns, output='stats.html', title=f'{ticker} Sentiment')
webbrowser.open('stats.html')
Cumulative Returns and Key Performance Metrics
Monthly Returns and Return Quantiles

The strategy was only deployed from December 2018 to July 2020, explaining the 0.0% returns in 2018.

Strategy Evaluation…

Would this strategy have outperformed buying 1000 shares of Google in 2018 and selling them in July 2020? No, a simple buy and hold would have made at least double the amount made using this strategy. But remember, this is a very simple strategy which could use a lot of improvements. It only looks at the daily sentiment of a few news articles; to expand it, you could use other indicators, whether fundamental or technical, such as Moving Averages, RSI, or other alternative data such as Tweets (for crypto currencies). Furthermore, the time spent in the market with this strategy is quite low, which is also a good thing to look for.

However, Google stock was only more profitable with the buy and hold strategy because it followed an uptrend. When combined with a volume volatility indicator as a buying signal, the sentiment analysis driven strategy can actually beat a buy and hold on “steady decliners,” or companies that have not fully recovered from the Covid Crisis, such as the Citigroup:

Backtest results of applying the strategy on Citigroup stock from December 2018 to June 2020
The strategy made 15,690$ in profit and outperformed a buy and hold on the same time period

With this, I have completed my goals. Including sentiment analysis as a live trading indicator is definitely worth looking into when implementing your own strategy. By processing financial news articles immediately as they are published and placing trade accordingly (buying the stocks with very positive reviews) may be profitable. Trying to live-deploy this strategy with an Online Broker could be the subject of a future article.

Thank you for taking the time to read the article, and to explore this subject with me,
Chedy Smaoui
(20/01/2024)

Disclaimer — I am not a professional financial advisor (yet), however, I am a university student with a passion for this, and if you have any questions, concerns, or even some suggestions to expand this subject further, then please do not hesitate to reach out, my LinkedIn messages are checked daily.

--

--

Chedy Smaoui

Aspiring machine learning quant that loves coding and modelling data!