How to build a Bitcoin Sentiment Analysis using Python and Twitter

Edoardo Ganetti
Analytics Vidhya
Published in
5 min readNov 19, 2020

Sentiment Analysis is a trend now, in fact, there are already several products around that analyse one or more social media to get out the sentiment on a certain financial asset. Some of these models are valid, others less so.

Why creating your own Sentiment Analysis?

The point is that in generalist models made by others we usually don’t find the detail we need, and they are also not malleable enough to our needs.

Since the resources needed to create a Sentiment Analysis are completely free, I thought of creating a short DIY guide, so that everyone can choose how to implement their own analysis and draw inspiration from it.

Prerequisites

There are some resources you will need to subscribe to and some libraries you will need to install for the Python script to run correctly. Furthermore, the resources that I am going to list below have been installed on a mac, with Python 3.9, so the result may need small tweaks if you use a different machine or environment. In any case, for any obstacles there is always the “Holy Stackoverflow” to come to our aid.

  • Python 3 installed with the following libraries:
  • Tweepy (pip install tweepy)
  • Google Clound Language (pip install — upgrade google-cloud-language)
  • Pandas (pip install pandas openpyxl)
  • Twitter development subscription and relative Oauth credentials
  • Google Cloud subscription and relative Oauth credentials for the NLP library

Once you’ve installed those libraries you’ll be able to import them in the first part of the code:

import tweepy
import os
from google.cloud import language_v1
import pandas as pd

Scraping from Twitter

Twitter has a public API that returns tweets based on a search done in JSON format. What we are interested in is having information on Bitcoin so we will do a search on #Bitcoin as per the code below. In any case, modifying this script simply by putting another coin or other related hashtags (for example #BTC) would allow you to extend or refine the search.

After changing the access tokens (insert-your-twitter-access-token-here)

You can modify the search parameters to determine how many tweets to retrieve per search and the maximum number of tweets (respectively tweetsPerQry and maxTweets). I strongly recommend doing some test queries to check the results and see if the tweets are relevant.

Other changes that can be made to the code are those aimed at refining tweets, for example by eliminating retweets. Have a look at the Tweepy API to understand the available options.

From here I extract only the text of the tweets and put them in a list (listposts) which will then be processed by Google NLP for the actual analysis.

consumer_key = 'insert-your-twitter-access-token-here'
consumer_secret = 'insert-your-twitter-access-token-here'
access_token = 'insert-your-twitter-access-token-here'
access_token_secret = 'insert-your-twitter-access-token-here'

tweetsPerQry = 100
maxTweets = 100
hashtag = "#Bitcoin"

authentication = tweepy.OAuthHandler(consumer_key, consumer_secret)
authentication.set_access_token(access_token, access_token_secret)
api = tweepy.API(authentication, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
maxId = -1
tweetCount = 0
listposts = []
while tweetCount < maxTweets:
if(maxId <= 0):
newTweets = api.search(q=hashtag, count=tweetsPerQry, result_type="recent", tweet_mode="extended")
else:
newTweets = api.search(q=hashtag, count=tweetsPerQry, max_id=str(maxId - 1), result_type="recent", tweet_mode="extended")

if not newTweets:
print("Tweet Habis")
break

for tweet in newTweets:
d={}
d["text"] = tweet.full_text.encode('utf-8')
print (d["text"])
listposts.append(d)

tweetCount += len(newTweets)
maxId = newTweets[-1].id
print (listposts)

You can find the Twitter scraping script I used here: https://gist.github.com/DeaVenditama/40ed30cb4bc793ab1764fc3105258d8a

Give the Tweets to Google NLP

Google NLP is the library that we are going to use to do the Natural Language Processing of the text we have extracted from Twitter. I chose to try this one because I’ve tried others in the past that didn’t convince me very much. Given that it is quite difficult to understand Sentiment in a few words, I found this Google library to be very elaborate as it offers two measures: score and magnitude.

In particular, the score corresponds to the overall emotional leaning of the text, which can be Positive, Negative or Neutral and has a value that can vary from -1 to 1 (-1 indicates a negative, 0 neutral and 1 positive).
The magnitude instead indicates the strength of the score. In other words, the strength of the emotion. Values ​​tending to 0 reveal a low strength, while values ​​tending to 2 are indicators of a high emotion strength.

Before launching the script, you need to have a Google Cloud account and follow the steps at this URL (https://cloud.google.com/natural-language/docs/reference/libraries#cloud-console) to install the necessary libraries and download the JSON file that contains the credentials for authentication. Once you have downloaded the file on your machine you will have to put the complete PATH of your file in place of (insert-the-path-to-your-json-key-here)

In this part of the code we do nothing but scroll the list with the tweets obtained before and call the Google NLP function that analyses the Sentiment, returning a pair of values ​​for each tweet (score, magnitude).

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "insert-the-path-to-your-json-key-here"
client = language_v1.LanguageServiceClient()

for x in range (len(listposts)):

try:

document = language_v1.Document(content=listposts[x]["text"], type_=language_v1.Document.Type.PLAIN_TEXT)
sentiment = client.analyze_sentiment(document=document).document_sentiment
sscore = round(sentiment.score,4)
smag = round(sentiment.magnitude,4)

listposts[x]["score"] = sscore
listposts[x]["magnitude"] = smag

except Exception as e:
print(e)
listposts[x]["score"] = 0
listposts[x]["magnitude"] = 0

The Google NLP script was provided to me by Daniel Heredia. On this page you can also find a generic example of analysis starting from Facebook posts.

Analysis of results on Excel file

In the last two lines, the list of tweets with their scores and magnitudes are processed thanks to the Pandas library and written to an excel file of your choice (insert-you-file-excel-path-here)

df = pd.DataFrame(listposts)df.to_excel('insert-you-file-excel-path-here', header=True, index=False)

When the script has finished running you will have an excel file similar to the one in the figure below:

The results show that a subsequent refinement of the extracted tweets is necessary, as it is easy to run into tweets that are not very relevant or, even worse, promotional. I believe the final value of this analysis is as accurate as refinement work that is done in advance.

In my case, out of 100 tweets the average score was -0.025 for a magnitude of 0.0517.

Next developments

There are a number of actions that can increase the value of this type of analysis in addition to tweets refinement, for example:

  • Extract the Bitcoin values ​​(most exchanges have free APIs) and correlate them with the score.
  • Use different searches by trying different hashtags
  • Use a smaller cryptocurrency, with a less populated hashtag on Twitter, to understand if the outcome can be more accurate ​​as it is less spammed
  • Understanding which timeframes are correct. In our example we’re taking the more recent Tweets. But in theory Sentiment could correlate very well in the short term and less in the long term, or the opposite.
  • Adding different metrics from the tweets: number of retweets, likes, etc… that can contribute to weighting or filtering tweets.

--

--

Edoardo Ganetti
Analytics Vidhya

Computer Science Engineer and Web Marketer. Interested in Bitcoin, Cryptocurrencies, Finance and the likes.