# How to build a Bitcoin Sentiment Analysis Strategy

TL;DR: We built a profitable Bitcoin sentiment strategy yielding 2400% returns over 24 months. Adding trading fees made the strategy more realistic while finding optimal sentiment combinations and window sizes increased returns dramatically.

In the previous article, we described how to build a strategy based on Augmento *Bullish* and *Bearish* Bitcoin sentiment, and backtested it on Bitmex XBTUSD.

The signal was created by

- computing a ratio of Bullish/Bearish sentiment
- smoothing this signal by applying a 7 day MA (moving average)
- creating a second signal by applying a second 7 day MA on the smooth signal
- computing the difference between the two

This resulted in a stationary signal which we translated into a strategy with a PnL (Profit and Loss) of circa 40x over a period of two years.

In this article, using Bitcoin sentiment data from Twitter, we will discuss how to simulate live trading conditions more realistically and how we can optimize the strategy further. We will do so by adding trading fees, selecting other sentiment pairs, and testing various window size parameters.

# Factoring in fees

The backtest in our previous article ignored fees which lead to overoptimistic results. In order to simulate realistic costs of trading, we assume a taker fee of 0.75% (as on Bitimex). Each time a long or a short position is executed, a fee of 0.75% of the trade is subtracted from the PnL. This is shown in the last two lines of the code below:

`for i in steps:`

if s[i-1] > 0.0:

pnl[i] = (p[i] / p[i-1]) * pnl[i-1]

else if s[i-1] < 0.0:

pnl[i] = (p[i-1] / p[i]) * pnl[i-1]

else if s[i-1] = 0.0:

pnl[i] = pnl[i-1]

if sign(s[i-1]) != sign(s[i-2]):

pnl[i] = pnl[i] — (pnl[i] * trade_fee)

Adding fees to a strategy changes the PnL drastically. Though the *Bullish/Bearish* strategy in the last article achieved a PnL of above 30, adding 0.75% fees for every trade reduced the PnL to 2.5. In the following sections, we will look at how we can optimize the parameters of the strategy to perform well, even in more realistic market conditions. That is, a) finding optimal combinations of Bitcoin sentiments and b) optimizing window sizes of the moving averages.

# Finding top performing Bitcoin sentiment combinations

The Augmento API currently provides data on 93 Bitcoin sentiments and topics, equating to 8649 possible combinations of topic and sentiment pairs. There are good reasons to test them all. For example, *Bearish* sentiment could surge temporarily due to an expected correction, but may not indicate a long term *Negative* outlook. Also, combining sentiments (e.g. *Negative* or *Optimistic*) with topics (e.g. *Hacks* or *Technology*) could lead to trading signals that are able to pick up the Bitcoin community’s emotions in the context of topics that matter to them.

The goal is to find the optimal sentiment/topic pair. That’s why we ran the entire process (see the last article) from signal building to backtesting on all possible 8649 combinations of Bitcoin sentiments and topics. For this test, we kept the window size for the MAs constant at 7 days in order to create the first list of possible top performers. The outcome is a huge list of PnL.

Top Pnl sentiment pairsScaling (De-)centralisation 2.972788

topic/sentiment1 topic/sentiment2 PnL

Bearish Bullish 3.008512

Scaling Bullish 3.095835

Scam_Fraud Launch 3.163351

Rebranding Risk 3.330282

Bearish Positive 3.541391

Panicking Bots 3.624959

Bug Whales 3.750890

Pessimistic_Doubtful Whitepaper 3.813242

Whales FOMO_theme 3.869889

Shilling Team 3.869968

Leverage ETF 3.981470

Rebranding Marketcap 4.003318

Bots Wallet 4.348451

FUD_theme Open_source 4.698155

Bearish Announcements 6.329139

Open_source Community 6.670214

Whitepaper Bots 14.288472Here are the bottom pairs:Investing/Trading Bearish 0.000422

topic/sentiment1 topic/sentiment2 PnL

(De-)centralisation Price 0.000424

Positive Selling 0.000434

Learning Bearish 0.000571

Advice/Support Bearish 0.000692

Euphoric/Excited Long_term_investing 0.000718

Technical_analysis Short_term_trading 0.000743

Problems_and_issues Short_term_trading 0.000836

Learning Good_news 0.000877

Euphoric/Excited Short_term_trading 0.000885

Scam/Fraud Token_economics 0.000941

Listing Token_economics 0.000953

Problems_and_issues Due_diligence 0.000978

Positive Hopeful 0.001021

Problems_and_issues Fearful/Concerned 0.001069

Use_case/Applications Short_term_trading 0.001078

Prediction Going_short 0.001093

Uncertain Short_term_trading 0.001124

Technology Short_term_trading 0.001135

Learning Adoption 0.001171

Interestingly, many of the top performing pairs have “negative” connotations for topic/sentiment 1 (*Pessimistic_Doubtful, Bug, Shilling, Bearish), *while many topics/sentiments with “positive” connotations lie under topic/sentiment 2 (*Bullish*, *Positive, Open_source)*.

The next step in the search for the top performing pair is plotting the PnL of the selected top 20 topics/sentiments against different window sizes, where both the long and short window parameters share a value. We do this to get some idea of how each pair behaves for a range of window parameters. Here we’re looking for pairs that respond well for a wide range of parameters (wide flat lines) rather than pairs with the highest peaks since pairs that perform well across a range of parameters are more likely to be robust to changing market conditions

There is no single optimal window size for all pairs of topics but the bigger windows tend to yield a bigger PnL. The explanation might be that a longer window might be a better fit for the data, though we must be aware that larger window sizes are more likely to overfit the data.

There is not always a clear intuition between sentiment/topic pairs and PnL. For example, *Whitepaper*/*Bots* yielded the highest PnL. But there is no reason why a high ratio of mentions of *Bots* relative to *Whitepaper* should produce a signal to hold a long position. Though *Bearish*/*Positive* was not the best performing pair (giving a PnL of 3.54), it aligns best with our intuition, and so we will use this pair for further analysis of window parameters.

# Optimizing the window parameters

Last time, we smoothed the sentiment data by taking an SMA for the past 7 days. Furthermore, to generate a signal for a “real” sentiment, we calculated a rolling mean of that smooth sentiment, also using a 7-day window. The choice of parameters was arbitrary. Therefore, it would be interesting to see how our strategy would have performed for other window parameter combinations.

In this test, we ran the strategy above using the *Bearish/Positive* for all possible combinations of long and short window sizes between 1 and 60 days. The resulting PnLs are plotted on the heatmap below:

The graph gives the performance of the strategy across window parameters, with high PnLs in green, and low PnLs in red. There are some “islands” where PnL is higher than in the rest of the graph. These islands are usually located in the areas where the first moving average is longer than the second one. Since we want PnL to be similar over a range of parameter values, we want to be within areas where PnL is high but at the same time not fluctuating too much as a function of the window parameters. These areas can be seen as “stable.” A good example would be the areas circled on the graph. We also plotted the performances of the chosen points. The strategy with the highest PnL uses 26 as the first and 7 as the second parameter for the moving windows.

All four strategies perform well both in the bull market of 2017, and the bear market of 2018. Though strategy A appears to outperform B, C, and D, it also appears to be less stable, resulting in large up-swings and draw-downs. Strategy D looks significantly more stable but underperforms the other three. B and C appear to be similarly stable to D while performing slightly better. Referring back to the heat map, B and C are also in what appears to be a wider flatter area of reasonably high PnL. For this reason, we would select the parameters from C for a live strategy (28, 14), based on a resulting return of ≈24 BTC, based on a starting wallet of 1 BTC (2400%).

# Python on steroids

Running 8649 backtests using NumPy and Python without any optimization takes a while, and running it for the first time would have taken 6 hours. To boost the speed, we used Numba, a JIT (Just In Time) compiler that compiles Python code into C. After implementing Numba, It took us not more than two minutes to get an array with all 8649 PnLs.

# Conclusion

We made modifications and added fees to the backtest. Moreover, we also showed how other Augmento topics can be used to generate a strategy. Among all pairs of topics, we identified the top 20 signals that would yield a profitable strategy. Even though some of them are not easily interpretable, some provide a good intuitive interpretation. We gave an example of a signal based on *Bearish/Positive* Bitcoin sentiment but other interesting ones might also be *Pessimistic_Doubtful/Whitepaper* or *Bearish/Launches*, all of which yield positive and relatively high PnL while providing us a natural (easy) interpretation.

The backtest presented can still be improved. Additional features by adding slippage, market volume, among others, could make a backtest more robust. Furthermore, we can pick window sizes randomly at each step, this would show how stable our strategy is. We will consider all these topics in our next articles.

Check out the complete code and the historical Augmento sentiment data here.

*This article was produced by **augmento.ai** as part of a series of getting started guides for using their data, and does not constitute investment or trading advice.*