Building microservice for Twitter analysis -Weekend of a Data Scientist

Alexander Osipenko
Cindicator
Published in
3 min readJul 13, 2018
Example of real time Twitter sentiment analysis for ‘Bitcoin’ topic

A weekend of a Data Scientist is a series of articles with some cool stuff I care about. The idea is to spend a weekend learning something new, reading and coding.

Building microservice for Twitter Real-time data collection and sentiment analysis.

First of all, I would like to point out that the skill of building MVP and microservices for a Data Scientist is extremely useful! When you can build a prototype and test it working environment it just feels so much better and allows you to deeper understand your final product.

There is a famous Venn diagram made by Drew Conway where hacker skills refer to programming skills. So according to this diagram programming skills are important for Data Science in general and for a Data Scientist particularly.

For disclaimer: I don’t think that Data Scientist should build production-ready systems from the ground up, this is work for professional backend developers.

Right, so programming skills are important, what next?

The first thing a data scientist needs is data. Twitter is a treasure trove of data that is publicly available. It’s a unique data set that anyone can access. There are plenty of studies that showed the predictive power of Twitter sentiment.

Here are some examples:

Bollen, J., Mao, H. and Zeng, X., 2011. Twitter mood predicts the stock market. Journal of computational science, 2(1), pp.1–8.

Pang, B. and Lee, L., 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1–2), pp.1–135.

Wen, M., Yang, D. and Rose, C., 2014, July. Sentiment Analysis in MOOC Discussion Forums: What does it tell us?. In Educational data mining 2014.

Practical project

Practical project for this weekend is building a microservice that can collect Twitter Data and perform some basic sentiment analysis.

Source code can be found in my repository

What it does:

  1. It receives POST request with instruction of which twitter it needs to collect
  2. Then it opens socket connection with Twitter API and starts to receive real-time Twitter data
  3. Each tweet assigned with its sentiment score with help of TextBlob
  4. All data then goes to the PostgreSQL database

Main features:

  • Because it uses Nameko and RabbitMQ, you can asynchronously run many different tasks and collect different topics.
  • Because it is based on Flask, you can easily connect this with other services that will communicate with this service by POST requests
  • For each tweet, the service assigns sentiment score using TextBlob and stores it to the PostgreSQL, so later you can perform analysis on that.
  • It scalable and expandable — you can easily add other sentiment analysis tools or creative visualization or web interface because it is based on Flask

Example of POST request json

{
"duration": 60,
"query": ["Bitcoin", "BTC", "Cryptocurrency", "Crypto"],
"translate": false
}

where duration is how many minutes you want to collect streaming twitter data, query is a list of search queries that Twitter API will use to send you relevant tweets, translate is boolean in case False will collect only English tweets, otherwise, the service will try machine translation from TextBlob.

What can be done next

  • You can make for real-time visualization system (like on the picture on the cover)
  • You can make models for predicting assets movements (for example Bitcoin price)
  • You can try different sentiment analysis tools like Vader and Watson
  • You can collect data enough for deep exploratory analysis and find some cool insights, like who is “an opinion leader” in some topic

Are you trying to build your MVP or learn something about sentiment analysis? Let me know in the comments, maybe I could help you!

Previous articles:

1. Weekend of a Data Scientist — July 6th 2018 — about Interpreting Model Predictions

2. Weekend of a Data Scientist — May 25th 2018 — some interesting articles

3. Podcasts for data scientist

--

--

Alexander Osipenko
Cindicator

Leading/Coaching/Building Data Science teams from the scratch