An experiment in sentiment analysis for the crypto markets
Sometime at the beginning of 2019, I had a thesis to test: the price of bitcoin and ethereum are sometimes correlated with perceived sentiment in the market.
For example: a news article would announce that an exchange was hacked, so the price would fall, or a developer would announce on Twitter that they’ve open sourced a scaling solution, so the price would rise. I’m not a big believer in the magic of Technical Analysis (TA), so did not incorporate any TA signals or TA sentiment in my experiment.
I found that there were a few services that provided APIs for sentiment analysis of this kind, however I thought to myself: ‘If you found a way to predict the market, why would you create a SaaS business out of it instead of becoming the next Bridgewater or Berkshire Hathaway?’. These services had a lack of ‘skin in the game’ and kept all the potential upside, with no downside. So I decided to investigate this myself by building and deploying, to production with real funds, an automated sentiment analysis experiment.
TL;DR: The experiment traded based on sentiment analysis for 4 months and was unprofitable. This could mean a few things: the thesis is disproven, the experiment didn’t run for long enough, or sentiment signalling had too much noise during the experimental time (i.e. in a bear market). I’ll leave it up to the reader to make an informed decision 😎
FYI: The experiment and all code is open source, link at the bottom of this article.
In the early stages I gathered some data from various sources and correlated the price / sentiment data with the following:
Since the point was not to be right all the time, but to be right when it mattered, I considered a score above 0.4 to be ‘good’ enough. As you can see, the higher than 0.4 correlations (using Pearson correlation score algorithm) were correlated with Reddit sentiment . As expected, news sentiment had a very low correlation.
Even more interesting was plotting this visually, firstly as a standard overlay:
Then using logarithmic scales:
Twitter logarithmic scores had the most interesting visual:
With this early analysis in hand, I set out to build what I would call Futura…
Based on Futurama mythology, I built an automated infrastructure on GCP. If you’re interested to read about technical details, see the open source repo at the bottom of this article.
This included a way to continually ingest and process the sentiment of ‘the market’ (for the experimental phase, I restricted this to Twitter, News, and Reddit sources), ingest pricing information, continually backtest (3–6 times per day) to find the optimal parameters, and to make automated buy/sell orders based on this data. It turned out to be quite a large project for a 1-person team, but I enjoyed the learnings 😇
Futura started collecting price and sentiment data on 2019–04–25, submitted production buy/sell orders for BTC on 2019–07–23, did the same for ETH on 2019–09–19, then I killed it on 2019–11–18. Total production running time was 4 months, but the total project running time was less than 7 months.
I started with a relatively safe and small amount of 1,759.62 EUR, lost 703.73 EUR, however the total capital recycling amounted to 57,562.90 EUR. Trades ranged from 100 EUR+ (for ETH) and 1000 EUR+ (for BTC).
With my monitoring systems (i.e. Slack) continually notifying me of Profit/Loss statuses, it became a stressful time. Sometimes it would make a large profit, but most of the time it would make a loss.
Bear market trends
As you can see from the graphs in the previous sections, the original 11 day period of data collection occurred when the price was trending up. When I deployed buy/sell orders to production, the price was trending down, as shown below:
A (favourable) interpretation of the results is that I deployed it at the wrong time, iterating and improving the system with compounding losses at the wrong time. Maybe in a bull market the iterations and improvements I made would have compounded profits instead?
Another (what I consider) important aspect is that during a bear market, most of the news is negative, with positive news not having a large magnitude when compared with the net score. This could have also affected the outcome, as during the initial data gathering period, sentiment was generally positive and ‘up and to the right’.
An (unfavourable) interpretation is that I was trying to find correlation where there was none, and managed to overfit the data to the result I wanted. After reading through most of Nassim Taleb’s books during this experiment, I (mostly) became a believer of being ‘fooled by randomness’.
Naive sentiment analysis method
The method of sentiment analysis used was based on a simple and naive solution: AFINN and emoji word list scoring (more info on this on the Github repo). The goal was to prove or disprove the thesis, then iterate and optimise the sentiment analysis method later, maybe with some machine learning techniques.
The method was naive as it only scored the words individually, then created a score of all the words in the text it was fed. It doesn’t score nuances in language, such as ‘OMFG buy the dip or GTFO and sell’ (which I perceive as positive sentiment, but the scoring would have it closer to neutral due to the positive and negative words cancelling each other out).
A (favourable) interpretation of this is that the naive method worked as the market was trending upwards. However when it trended downwards, people’s expressions became more nuanced, incorporating both positive and negative words in their text.
An (unfavourable) interpretation is that the method results were random, just happened to be correlated in the early stages, but was never actually correlated to the price.
Naive sentiment sources
The sentiment scores were calculated by ingesting various sources — Twitter, News, and Reddit. Similar to the above, the goal was to prove/disprove the thesis, then iterate on the sources later.
A (favourable) interpretation is that the sources were good in the early stages, as progress in the industry or major events would be primarily ‘posted’ to these sources first. However as the price trended downwards, these sources became ‘secondary’ sources, with price action happening before the information was available to the wider community.
An (unfavourable) interpretation is that the sources were always ‘secondary’ sources, with sentiment scores being a trailing indicator of news and feelings in the industry.
When I started researching sentiment analysis, I couldn’t find many good sources of information with practical, real world, and ‘in production’ examples (although there are plenty of academics). The general culture of finance (and maybe #DeFi tooling) is secretive, as a slight edge can compound to millions of dollars.
With this article and the open source repo, I hope to add to the growing space in some small way. I believe there is value is sharing results (whether positive or negative), learning together, and helping to grow the space.
Open source repo (with technical overview): https://github.com/mrdavey/Futura-os
If you’d like to contribute, bounce ideas, or found Futura’s code valuable, reach out to me on Twitter: @daveytea