How to Plot Timeseries Data in Python and Plotly

A simple tutorial on handling time series data in Python from extracting the dates and others to plotting them to charts.

Published in

Nerd For Tech

6 min readFeb 28, 2021

Handling time series data can be a bit tricky. When I first had to deal with time-series data in Python and put them into charts, I was really frustrated. I probably spent a whole day just trying to figure out how to extract the dates or the months from a series of timestamp data, and at the end of the day, I still didn't understand a lot of things.

Then when I had to transform these data into charts using libraries like matplotlib or plotly, it only added more confusion to me. Luckily by now, I have finally learned how to solve these problems.

I am going to show you some code about this that I have learned recently. Hopefully, my future self or anyone looking for clues to do time series visualization will find this helpful.

The Data

I don't want to use dummy data for our examples here, so I am going to use real data instead. I collected all tweets from 1 January 2020 to 31 January 2021 (13 months of data) that had the word “malioboro” in it.

Malioboro is a street in Yogyakarta — the city I have been living in for the past two years — which is known as the largest tourist attraction in the city. It’s famous not only to local tourists but also to international tourists (at least before the COVID-19 pandemic this place used to be swarmed by tourists from countries around the world). Hence, I believed it must be mentioned quite significantly on Twitter on day to day basis.

I used a library called Twint to collect the data from Twitter. You can download the data here (JSON format) and take a look at the code I used to collect the data here on my Github page. However, you can use any data you own (as long as they have dates in them) to follow this tutorial.

People in Malioboro Street. (Photo by arialqadri — on Unsplash)

Loading the Data

Let's start by importing some important packages and the data themselves.

Since the data collected are in JSON format, I need to make Python read them line by line and convert them to pandas data frame format.

import pandas as pd
import jsontweets = []
for line in open('data/keyword-malioboro.json', 'r', encoding='UTF-8'):
    tweets.append(json.loads(line))
df = pd.json_normalize(tweets)
len(df)

There are exactly 88,035 tweets collected.

For a year-long data, it’s pretty awesome, right? Malioboro is indeed a famous place, otherwise, Twitter wouldn't be talking about it.

Extracting Dates

The time data aren’t in a standard format yet. If you look at the picture below, you’ll see that it has the timezone name in the back. My laptop’s timezone is set to GMT+07 (Bangkok/Jakarta, South East Asia), and Twint probably followed that format.

Depending on the timezone set in your laptop, you’ll probably see similar cases. We are going to remove these substrings and leave only the date-time to be stored into a new created column.

from dateutil.parser import parsedf['created'] = [x.replace(" SE Asia Standard Time", "") for x in df['created_at']]

Now we have the date-time data stored in the created column. However, since we only need the dates and months data, we are going to parse those things using the following code.

df['date'] = [parse(date).date() for date in df['created']]
df['monthyear'] = pd.to_datetime(df['date']).dt.to_period('M')

To plot the data we need the numbers of tweets per their respective time unit (months or days).

Plotting by Month

First, we are going to plot the data by month. We do that by making a new data frame consisting of each month and their respective numbers of tweets.

by_month = pd.to_datetime(df['date']).dt.to_period('M').value_counts().sort_index()
by_month.index = pd.PeriodIndex(by_month.index)df_month = by_month.rename_axis('month').reset_index(name='counts')
df_month

There are 13 months from Jan 2020 to Jan 2021.

These data are now ready to be plotted.

Plotting by Month in A Line Chart

import plotly.express as px
import plotly.graph_objs as gofig = go.Figure(data=go.Scatter(x=df_month['month'].astype(dtype=str), 
                        y=df_month['counts'],
                        marker_color='indianred', text="counts"))fig.update_layout({"title": 'Tweets about Malioboro from Jan 2020 to Jan 2021',
                   "xaxis": {"title":"Months"},
                   "yaxis": {"title":"Total tweets"},
                   "showlegend": False})
fig.write_image("by-month.png",format="png", width=1000, height=600, scale=3)
fig.show()

Plotting by Month in A Bar Chart

fig = go.Figure(data=go.Bar(x=df_month['month'].astype(dtype=str), 
                        y=df_month['counts'],
                        marker_color='indianred', text="counts"))fig.update_layout({"title": 'Tweets about Malioboro from Jan 2020 to Jan 2021',
                   "xaxis": {"title":"Months"},
                   "yaxis": {"title":"Total tweets"},
                   "showlegend": False})

fig.show()

Plotting by Day

You can also make a more detailed chart that shows the trend day by day, but I don't suggest this for a longer time data (say, three years or more) because the lines would become really confusing and harder to read.

We do this by creating a data frame that stores the number of tweets day by day.

by_date = pd.Series(df['date']).value_counts().sort_index()
by_date.index = pd.DatetimeIndex(by_date.index)df_date = by_date.rename_axis('date').reset_index(name='counts')
df_date

fig = go.Figure(data=go.Scatter(x=df_date['date'].astype(dtype=str), 
                                y=df_date['counts'],
                                marker_color='black', text="counts"))fig.update_layout({"title": 'Tweets about Malioboro from Jan 2020 to Jan 2021 Day by Day',
                   "xaxis": {"title":"Time"},
                   "yaxis": {"title":"Total tweets"},
                   "showlegend": False})fig.show()

This chart looks very interesting because now you can see the peaks on certain dates. There must be something important occurring during those days that caused Twitter to talk about Malioboro more than usual.

I am going to highlight these important dates by putting some dots to show their significance in the data.

Here I am going to get the top three peak dates and store them in a data frame.

top_dates = df_date.sort_values(by=['counts'],ascending=False).head(3)
vals = []
for tgl, tot in zip(top_dates["date"], top_dates["counts"]):
    tgl = tgl.strftime("%d %B")
    val = "%d (%s)"%(tot, tgl)
    vals.append(val)
top_dates['tgl'] = vals
top_dates

Then I use this data frame to put notes on the chart. It’s pretty similar to the previous chart we have made, but it has some points that highlight these dates.

fig = go.Figure(data=go.Scatter(x=df_date['date'].astype(dtype=str), 
                                y=df_date['counts'],
                                marker_color='black', text="counts"))fig.update_layout({"title": 'Tweets about Malioboro from Jan 2020 to Jan 2021 Day by Day',
                   "xaxis": {"title":"Time"},
                   "yaxis": {"title":"Total tweets"},
                   "showlegend": False})
fig.add_traces(go.Scatter(x=top_dates['date'], y=top_dates['counts'],
                          textposition='top left',
                          textfont=dict(color='#233a77'),
                          mode='markers+text',
                          marker=dict(color='red', size=6),
                          text = top_dates["tgl"]))
fig.show()

You can change the text shown on the points with any text you want, for example, you can use that to write down some notes about the events that caused the peak points.

These charts are interesting, but without a story to accompany them, the readers wouldn't get many clues.

In my case, Oct 4 and Oct 8 became the peak points in the chart because there’s a huge demonstration that happened in Malioboro during those dates. Twitter users were quick to update the situations which caused the term Malioboro to trend and became the talk of the news.

On Dec 31 another peak point happened because the local Governments decided not to lockdown or restrict Malioboro during New Year's Eve despite the increasing number of COVID-19 cases in Yogyakarta. Users on Twitter immediately joined in the controversies by tweeting their opinion.

These charts and context have hopefully created some kind of story about our data.

In short, plotting time series data using Plotly are actually pretty simple and straightforward. If you still find some things confusing, that's okay, you don't have to get everything on the first try because sometimes it takes some time to get used to.

Hopefully, you find this tutorial easy to follow, otherwise hit me up with some comments!

You can look at the whole code I used in this article on my Github Repository below. Thanks for reading!

catris25/timeline-plotly-examples

Contribute to catris25/timeline-plotly-examples development by creating an account on GitHub.

github.com