The Eye of Heimdall

Analyzing cryptocurrency user’s sentiment through twitter data

Anthony Bet

Published in

Heimdall Research

7 min readFeb 2, 2021

In this article I will show how the Heimdall team developed an A.I system to identify sentiment in cryptocurrency.

Our crawlers collect, on a daily basis, more than twenty thousand tweets related to the crypto market. If well explored, this huge amount of data may lead us to valuable information on how people fell about the market or some specific project. Today Twitter has more than 330 million active accounts, where users share their hope, desires and frustrations, day and night, this also holds true with the Twitter crypto community.

Extracting sentiment from text is a particular instance of the more general problem of Natural Language Processing (NLP). Our objective was to create an algorithm capable of analyzing the tweets text, written in English, and classify these tweets in three categories of sentiment, those being: positive, neutral, negative.

The use of tweets turns the project very challenging, because the kind of texts we find on twitter are full of slang, typos and emojis, and a lot of those texts reveal more than one category of sentiment. Besides that, there is a vast number of active bots whose tweets are extremely repetitive, this can ruin our training data.

To deal with this problems we had to create our training data from scratch, since the available data we found online was not suitable for us. Our first step to create the dataset was to define what each sentiment should communicate. Negative sentiment represents tweets that communicate a sentiment opposite (not in favor, unhappy, bearish) to some project or item. Examples of tweets that we aimed to classify as negatives.

If rumours are true, if you’re in $BAND protocol then get the frig out right now.
@JJcycles @LilMoonLambo Another leg down lol $link
@udiWertheimer Don’t be jelous because BTC is boring lol
This is the one time I want $BTC to actually crap itself downwards about $1,000. So many Short opportunities :x Dare I say it, down we go please.

In contrast, positive tweets communicate favorable expectation regarding some item. Examples of tweets that we aimed to classify as positives:

So I bought a bitcoin yesterday. Almost 1K in 1 day aint bad lol https://t.co/Niv3IVbtEE
@benbuiz Got my ticket already and my bag is full of $UTK
hehe my bitcoin making me money 🥺✨✨👉🏼👈🏼💰💰💰
It’s Friday and I’m very happy and I’m bout to load up on some BTC you know the vibes.
@BrukeFasil Yerp $VET, $XLM, $XRP, $XTZ for the long hodl

Neutral sentiment. It represents tweets which do not communicate explicitly positive or negative sentiment. Here are some tweets:

The latest The Bitcoin Daily! paper.li/f-1508960225?e… #bitcoin #blockchain
Bitcoin Block Generation Speed Falls to 2017 Lows whatsgoon.com/tech/technolog…
Man, I could literally just retweet every single thing on my bitcoin twitter follow list today.
$XLM is now worth $0.0581 (-0.09%) and 0.00000664 BTC (-0.30%) #XLM ➡️ coinvalue.xyz/coin/xlm/
Synapse is 85% complete! Phore synapse update is here. medium.com/@phoreblockcha… $PHR #Phore #BTC

You may read some of these neutral tweets like the second, and disagree about the neutrality, but is important to have in mind that we are trying to catch the genuine sentiment of users, and at the end of the day tweets like news are just communicating facts that we already know.

After deciding the meaning of each class we selected a good sample from our database to be categorized by our team.

Each tweet was classified by three persons to improve the consistency of the results. To select a good sample from the database we ruled out tweets from the 500 most active users, since we noticed that the majority of those users were bots, and furthermore we only selected “small” tweets with less than 150 characters, given that longer tweets are composed by more than one sentiment and according to our data those “small” tweets represents about half of the total data. Here we are making an assumption that observing “small” tweets is enough to identify the overall sentiment about the market. At the end we were able to classify thousands of tweets, of which 10% belongs to negative, 20% to positive and 70% to neutral class.

With this in hands we were able to identify the main issues to be solved, which are the structure of the text, the number of character, the number of classes, and the data not being balanced.

Exploring the Data

Our first step was to analyze the most frequent terms in each class, this allows us beforehand to observe if there is a distinction in the data and gives us hints about feature extraction. To do so we generated a wordcloud for each class in the training data. A pre-processing was made to eliminate retweet pattern, users pattern, brokers and coin symbols. This is the final result:

We can see that the most frequent terms associated with Negative sentiment include dump, sell, down, short, scam, bearish, crash. While for Positive sentiment the terms are buy, bullish, pump, moon, good, long, buying.

We were expecting these terms to be associated with their respective sentiments, but our goal was to find complex relations between then so we could develop a system that goes beyond a simple word counter.

Text Cleaner

Our second step was to clean and normalize the text. This is essential to reveal the patterns for the machine learning algorithms.

The first part of the cleaning is given by the following transformations:

Clean the tweeter retweet pattern: ‘RT @username’;
Replace ‘@username’ pattern by a word tag, that we called ‘usertag’;
Lowercase all the text;
Replace links by the word tag ‘linktag’;

The second part consists of replacing numbers, monetary quantities ($500,00), symbols and names of fiduciary and crypto coins, and brokers, by tags:

Money quantities → ‘numbertag’:
Brokers, e.g. Binance, Binfinity → ‘corretag’
Fiduciary symbols, e.g. USD, BRL → ‘fiatag’

The third part is related to English contractions and the use of emojis, stopwords and punctuation symbols. To deal with contractions we built a simple dictionary mapping contractions and their respective expansions. We chose to keep emojis in the text based on the hypotheses that an emoji is an efficient way one can communicate emotions in social networks. To keep emojis we made use of the emoji package to replace emojis by their name + the word ‘tag’, for example we have the following transformation:

👍 →(thumbsup + tag) → ‘thumbsuptag’ .

To handle the use or not of stopwords and punctuation we adopted the following strategy. Transforming sets of allowed stopwords and punctuation symbols in parameters we were able to evaluate which combination would lead to the best performance. The chosen combination was the one that doesn’t use either stopwords or punctuation, in other words, our cleaner maintains all the stopwords in the text but rules out all the punctuation symbols. To finalize the cleaning steps we applied a lemmatization to the words in the text using spacy.

Lets see the cleaner in action using some of the tweets shown earlier:

Feature Extraction

Our data is composed of text, we then need a tool to represent it as a numerical structure while preserving important information about the text. Therefore, we can apply machine learning models to it. The technique we decided to use is the TF-IDF (Term Frequency-Inverse Document Frequency) which creates a vocabulary from the words existing in the data and measures the frequency and the importance of each word in the vocabulary. To implement this we used the scikit-learn module TfidfVectorizer.

The TfidfVectorizer parameters that control n-grams, minimum frequency and maximum frequency were optimized along with the chosen models.

Model Selection

Altogether we tested and compared four models, the Random Forest, multinomial Naive Bayes, multinomial Logistic Regression and the Xgboost. The metric we used in the optimization is the Recall, it means that we want to maximize true positives (TP) and minimize false negatives (FN) during the classification, this is, we are maximizing the number of TP of a class but with the cost of increasing the number of FP.

The tested models which had shown the best results are Logistic Regression and the Xgboost, lets see their performance:

Logistic Regression:

Accuracy: 0.798
Precision: 0.789
Recall: 0.798
F1: 0.788

Xgboost:

Accuracy: 0.818
Precision: 0.808
Recall: 0.818
F1: 0.804

These confusion matrices are normalized by prediction, i.e. for a given prediction (column) the values indicate the proportion of instances that were classified as so, for example, looking at the first column of the Xgboost matrix we see that for those tweets classified as negative, 8% are actually positive, 1.9% neutral and 75% are really negative.

Comparing both models we see that Logistic Regression presents better results for the positive class, however the Xgboost performance is better for all the other classes, and in general it shows better results, specially if we analyze the accuracy, precision, recall and F1 score, where the Xgboost performs better in all of them.

After this considerations we decided to use Xgboost as the final model to classify our tweets.

Final Product

The sentiment that you can analyze in our platform is built aggregating the sentiment for each coin every 30 minutes, i.e. we calculate the proportion of positive and negative sentiment of each coin in intervals of 30 minutes.

Bitcoin price between July 2020 and January 2021. Green and red lines indicates proportion of Positive and Negative sentiment respectively. The sentiment and price here is aggregated in 24 hours intervals.

You can check the time evolution of your favorite projects in our platform heimadall.land.