Building a Live Tweet Dashboard using Tweepy, MySQL and Streamlit

Keep up-to-date with public sentiment using live tweets and an easy-to-build Streamlit web app.

Jonathan Readshaw
Towards Data Science

--

Streamlit makes it easy to turn Data Science projects into web apps and dashboards (Image by Author)

Twitter can be used as a data source for various data science projects, including Geo-spatial analysis (where are users tweeting about certain subjects?) and sentiment analysis (how do users feel about certain subjects?).

I decided to build a dashboard that combines the two questions using the example of UK political leaders; Boris Johnson and Keir Starmer. However it’s possible to use any combination of keywords to analyse political figures, companies etc.

We can break down what needs to be done into three distinct areas:

  1. Database set-up: This can be done directly in the RDBMS of your choice, however we choose to use SQLAlchemy in this example.
  2. Tweepy: Credentials are required to interact with the Tweepy API. Once these have been obtained we can set up a stream with keyword filters.
  3. Streamlit: Once we have our data stream working we’ll need to set up our web app using Streamlit. This is surprisingly simple and can be done within a single python file!

This article will work through each point above and show how they interact.

All code is available on GitHub: https://github.com/jonathanreadshaw/streamlit-twitter-stream

Database Set-up

We need a MySQL server up and a running before we can create our database and tables. This can either be done locally or on a hosted server, however I would recommend setting this up locally first. Your app can then be moved to hosted infrastructure at a later date.

First let’s create our database through the following SQL statement:

CREATE DATABASE twitter_sqlalc;

Creation of the table and database transactions will be handled using SQLAlchemy. This is a Python library most commonly used as an Object Relational Mapper (ORM), handling the communication between our Python code and the database. SQLAlchemy allows database tables to be represented as Python classes and the use of functions to automatically execute SQL statements.

The class below represents the table we will use to store tweets:

In a separate file database.py we define various variables that are required to perform our database operations using SQLAlchemy:

Details on engines, base and sessions can be found in the SQLAlchemy documentation. There are two key functions within this file:

  • session_scope(): this function uses the context managed decorator to provide a SQLAlchemy session object to perform transactions. The decorator allows for use within the with…as… syntax, and will commit or rollback depending on success, before closing the session.
  • init_db(): this function will create the tables defined by our SQLAlchemy models if they don’t exist.

Tweepy Stream

Tweepy is a Python library to access the Twitter API. You’ll need to set up a twitter application at dev.twitter.com to attain a set of authentication keys to use with the API. Streaming with Tweepy comprises of three objects; Stream, StreamListener, OAuthHandler. The latter simply handles API authentication and requires the unique keys from the creation of your Twitter app.

The StreamListener class is used to define how each incoming tweet should be handled. There are two methods of the StreamListener class that we will overwrite in our child class TweetListener:

  • on_status(): Called when a new tweet is present in the stream. Tweets are defined as Status objects in Tweepy and passed as an argument to this method.
  • on_error(): Called when there is an error in the streaming process. The associated HTML error code is passed as an argument.

Our on_error() implementation can be kept simple. The code below will log a warning for all status codes except 420. This code is encountered when the stream has reached its limit, and therefore we need to close it.

on_status() is more involved and will handle processing of individual tweets. First let us look at StreamListener class’ __init__:

Apart from calling the base __init__ we define two additional attributes. Firstly keywords, which is simply a list of keywords that we use to filter the stream. Although the listener doesn’t directly handle filtering the stream, we will use these when storing tweets on our database. The second attribute is an instance of SentimentIntensityAnalyzer from vaderSentiment. This will be used to assign a sentiment score to each tweet. We could use a more complex model to perform the sentiment analysis (e.g. BERT) however VADER is perfect for demonstrative purposes, works well with Social Media texts and doesn’t require complex dependencies or a large amount of memory.

Let’s take a step-by-step look at our on_status() implementation:

  1. If the tweet is a retweet, stop processing to avoid storing duplicates.
  2. We want to store the full text, so we need to handle truncated tweets by retrieving the full text.
  3. If the tweet is geo-tagged, convert the coordinates to a string so it is easy to handle with SQLAlchemy.
  4. Call the check_keyword() method. This simply checks the presence for each of our keywords in the tweet, and returns the keyword that matches.
  5. Assign a sentiment score to the tweet. If this is zero we stop processing and don’t store it. This is to avoid storing lots of tweets with little sentimental value that would skew our daily/hourly metrics to be largely neutral.
  6. Finally we create an instance of the Tweet SQLAlchemy model we defined earlier, and insert the row through our insert_tweet() method.

All that is left to do is create an instance of Tweepy’s Stream class using our customer listener and auth classes. The code below shows how we’ll start the stream, including parsing the keywords as command line arguments.

Streamlit App

Once our stream is up and running we need a way to view our data in real time. For this we’ll use streamlit which allows us to build a interactive web app using a single python script. There are great examples on the streamlit website and the documentation is easy to follow.

I won’t go through all the code for the app in this article as it can be viewed on GitHub. The app will display following information:

  1. Influential Tweets: Tweets from users with the largest number of followers.
  2. Recent Tweets: The most recent tweets for the keywords in question.
  3. Hourly/Daily volume by keyword
  4. Hourly/Daily sentiment by keyword
  5. Location of tweets

These metrics/visualisations will allow users to keep track of trends in near real-time. We make use of the following streamlit features:

  • Caching: this is a key feature of streamlit. We can cache computationally expensive operation simply using the @st.cache decorator. These can be configured to expire after a set time (e.g. for loading fresh data)
  • Widgets: streamlit makes it easy to add interactivity to your app with buttons, drop downs etc.

Tweepy is easy to use and can be a powerful tool for tracking public sentiment towards key topics. Coupled with streamlit you can get an interactive web app up and running in an afternoon.

Additional work could include:

  • Named Entity Recognition: what are users tweeting about in addition to our keywords (places, companies etc.)?
  • Incorporate additional data sources, such as a news API
  • Using the streamed tweets and sentiment as input to a downstream model e.g election predictions

--

--