Using News Sentiment Data for Investment Decisions

A Step-by-Step Guide to transform raw data into an AAPL Trading Signal in Python

Alex Pieper
YukkaLab
8 min readJul 29, 2021

--

source: PIX1861 via pixabay

The amount of data that is generated in this world is exponentially increasing. In fact, according to statista.com, it doubles about every 2–3 years. But as Seth Godin points out:

“Data is not useful until it becomes Information.”

And that is exactly what this article is about: Utilizing a relatively new form of data — News Sentiment — to derive Buy and Sell signals for stocks.

We will start by investigating the raw data before transforming it into actionable insights in the form of Investment signals for the AAPL stock. In the end, we will evaluate the quality of these signals. This article will be accompanied by a full implementation of the mentioned signal using Python and the base libraries pandas, numpy and matplotlib.

Data introduction

The underlying data for this signal comes from YUKKA Lab, who analyzes over 500,000 News Articles a day using Natural Language Processing (NLP). With this technique, every article is analyzed with respect to its sentiment and involved entities, such as companies, countries, indices, and persons.

For the purpose of this article, since we are only interested in the news sentiment for the company Apple, we are going to start by looking at the number of articles that mention the company Apple. These article counts are then grouped by date and sentiment, where the sentiment is always one of positive , neutral or negative. The whole dataset reaches back to 2015–01–01, but I added the latest five values to the following code snippet to give a general idea of what the data look like.

To give an example, there were 2362 positive, 1083 neutral and 444 negative news mentions for Apple Inc. on 2021–07–13 and the closing price of its stock was 145.64 USD.

This dataset needs no further pre-processing as the only missing values are the price data for weekends and holidays. Now let’s get to the main part of this article, where we gradually transform these data points into Buy and Sell signals.

Feature Engineering and Visualisation

A simplified version of the following plan would be:

“Buy/Hold the AAPL stock, when the news sentiment for Apple is high or Sell when it is low.”

Now, let’s turn this plan into reality.

The first step is to create a stationary metric for the overall sentiment, as opposed to the three integer values. There are multiple ways to do so, one of which would be the following equation:

This leads to a higher Sentiment for a more positive news coverage of Apple and a lower Sentiment for a more negative news coverage. Before looking at a plot, we have to create smoothed versions, because a signal based on daily data may be too volatile to trade with, taking into account transaction fees. We do this by simply taking a rolling mean over the last 30 and 90 days as shown in the code snippet below.

These two smoothed sentiment values represent the short-term and the long-term sentiment and will soon be used for the signal creation. They are visualized in the upper of the two plots below, together with the stock price of Apple at the bottom. When looking closely, downtrends in the stock price are very well represented in the sentiment like we can see in 2016, the end of 2018, and early 2020 when the Covid crisis hit the financial markets.

Github Link: Code for this plot

The last step to getting a clean and executable signal lies in translating these sharp downturns in sentiment into a sell signal, while not missing out on profits through false signals.

Signal Building

The goal of this article is to have a signal for each day with just one of two options: Be invested or don’t be invested. In order to achieve that, we are going to create a Z-score from the short and long-term sentiment values. For more information on the Z-score, see Wikipedia here. Essentially it is the difference between the short term and the long term sentiment divided by the long term standard deviation to normalize it. The calculation is shown in the snippet below.

In these 6 lines of code, we have the creation, rescaling, and smoothing of the sentiment Z-score as well as the final Buy/Sell signal, that we are going to have a look at in a second. The operation in line 4 bounds the score to be between 0 and 100, mostly for visual/interpretation purposes. The interpretation of a high Z-Score would be that the news coverage of Apple was more positive in the short term, compared to the longer news history. Afterward, we smooth the score in line 5 with a double rolling exponentially weighted mean to reduce jitter. Not doing so could cause daily buy-sell-buy oscillation, which unnecessarily drives up the transaction costs. The last line is creating the Signal, as it defines every day, where the Z-Score is above 15 as Buy/fully invested or 1. All other days receive a Sell/not invested or 0.

Finally, we can have a look at the result of this exercise: A trading signal for the Apple stock from 2015–01–01 to 2021–07–14 based purely on news sentiment data. The first plot contains the final sentiment Z-Score and the threshold of 15 as a horizontal line. The second plot shows the stock price on the left axis and the signal on the right axis, where 1 means Buy/invested and 0 means Sell/not invested.

Github Link: Code for this plot

Remembering we want the investment signal to be 0 when the stock price is falling, we can already see some timeframes where this worked perfectly, like in early 2020 and 2018. On the other hand, we can also see the signal being a bit late at the end of 2018 and some smaller wrong sell signals, like in 2019. The next and also last main part is about precisely evaluating a simulated investment, using this sentiment based trading strategy vs. a simple Buy and Hold strategy.

Evaluation

NOTE: A train/test split was not performed for simplicity’s sake. The point of this exercise is to illustrate a concept.

This is going to be fairly simple, as we only have to calculate the returns of the stock for the Buy and Hold strategy. For the sentiment based portfolio, we have to exclude the days, where the signal was 0, in order to simulate the stock being sold/not held. Before looking at the code for that, we have to accommodate for the fact that a signal is produced with the end-of-day values which we only have after the markets close. Therefore we have to shift every signal to the next day since that is the day we can execute the signal.

This shifting is done in line 1, followed by the calculation of the log-returns as log_ret(t) = log(1 + r(t)) , where r(t)is the return on day t. Having these returns, we can calculate the cumulative return of both portfolios in lines 3 & 4, where unmanaged is the Buy and Hold and managed the Sentiment-based portfolio. Line 5 creates the difference between these two portfolios by looking at a portfolio with the signal as signal — 1, which means 0 when the original signal is 1 and -1 (or short) when the original signal is 0. This works, because we outperform the Buy and Hold portfolio when the signal is 0. Then the outperformance equals exactly the performance of shorting the stock, i.e. return * -1. When the signal is at “Buy”, the outperformance equals 0, i.e. return * 0.

But as always, a picture is worth a 1000 words:

Github Link: Code for this plot

In the top section of the plot, we again have the stock price with the investment signal. Below that, we have the compound return of the Buy and Hold vs the sentiment portfolio. In the bottom section of the plot, we can see the outperformance of this sentiment-based strategy. Only by not holding the stock when the news sentiment is very low, we managed to get a Total Return that is significantly higher than that of the underlying asset itself.

For the last part, we are going to have a look at the quantitative metrics of these two strategies and their performance. Here, standard metrics like Volatility(Vola), Total Return (TR) and the annualized Sharpe Ratio (SR) are exactly what we are looking for. The calculations and results can be found in the Code snippet below.

Here again we see the good outperformance of this strategy, yielding almost 200% more return for more than 6 years of backtesting while having slightly less volatility than the Buy and Hold strategy. As a side note, the volatility when you interpret “Sell” as not being invested, will always be less or equal to that of the Base Asset. That would not be the case when you would open a short position on that stock when the signal is “Sell”. The last metric here is the annualized Sharpe Ratio, which rewards high returns but also penalizes for higher volatility. The Sentiment Portfolio’s SR is with 1.043 more than 48% larger, compared to the base Asset.

Summary and Outlook

What did we do in this article? We went from having raw News Sentiment data to having a daily executable investment signal for the Apple stock in only 14 lines of code (excluding backtesting (16 lines) and plots (45 lines)). A portfolio using this signal is outperforming the stock by only disinvesting in bearish times. These bearish times are detected using only information on how Apple is currently represented in the news landscape.

How can this signal be adjusted to personalize it? This signal requires some parameters, like the rolling time frames for the sentiment (30 & 90 days), the scaling and smoothing of the Z-score, and the interpretation of said Z-Score for the signal. Risk-friendly investors could make the signal more reactive and fast by lowering the rolling timeframes and smoothing. One could also short the stock, if Z-Score < 15, go out of the market ifZ-Score < 35 & Z-Score >= 15, and otherwise go long. This would probably yield higher returns, but would as well be riskier.

Will this strategy work for every stock? That is a harder question to answer. In my analysis, it works well for a large number of stocks but might also underperform for other stocks, as stock-independent market timing is a notoriously hard challenge. This article serves as a proof of concept for the robustness of using sentiment data to augment trading decisions.

Interested in more information about news analytics data, samples, or trading inspiration? Feel free to visit us at yukkalab.com or write me at api@yukkalab.de

Thanks for Reading!

--

--