Covid-19: Genetic Mutation or Engineering
Motion Chart Visualisation in Python
Using the long-term interest rate data and other sources to look at the economic impact of 12 pandemics in history, researchers found “significant macroeconomic after-effects” from pandemics that lasted for 40 years.
In this article, I let data talk for itself.
A visual motion chart which illustrates the trend of outbreak in different countries from 3 perspectives has a lot to say…
- Number of Confirmed cases
- Number of Recovered cases
- Number of deceased
Data Source
I forked data from data source operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). Also, Supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL).
Python code for this work is completely available on my GitHub, so I try to keep it here more generic.
Motion chart above is created with 5 dimensions:
- x-axis: number of confirmed cases
- y-axis: number of recovered cases
- Circles’ colour: Countries
- Circles size: number of deaths
- Time
Let’s get data in motion
Given data is being forked from JHU CSSE COVID-19 into my Github, using os
library, I look into the github folder and print the contents.
Among all the contents, I read_csv
the three marked files into three separate pandas
dataframe from which I’ve sample printed one below.
Preprocessing data
As usual, we need to prepare data for consumption. Our desired dataframe must look like the one below which has 5 columns.
I will explain the process for one out of the three data frames considering that they are all done exactly the same way.
Looking at the two data structures, it’s clear that we need to do some preparation. For that, let’s drop extra columns from the original data frame.
new_data_conf = data_conf.drop(columns=['Province/State','Lat', 'Long'])
Next, we need to convert everything into columns. There are numerous ways to do that. My approach is using groupby
based on country names to unify each of them into unique row. (remember that in the original data set, due to “province/state” field, same country was repeated in multiple rows).
A quick note on
groupby()
. It’s required to use a mathematical function for numerical columns for which I have usedmean()
.
new_data_conf = new_data_conf.groupby(['Country/Region'])[new_data_conf.columns].sum()
Now, to convert date
into a column, I used transpose()
, reset_index()
, and finally rename()
.
new_data_conf=new_data_conf.transpose()new_data_conf.reset_index(level=0, inplace=True)new_data_conf.rename(columns={'index' : 'Date'}, inplace=True)
Next, we need to melt
the data frame to create another column to include country names so that we can add the values of ‘confirmed’, ‘recovered’, and ‘death’ cases as three separate columns to come up with our desirable structure.
melt
each data frame separately:
cols = new_data_conf.columns.tolist()new_data_conf = new_data_conf.melt(id_vars='Date', value_vars=cols[2:])
2. Join them by using merge
one by one:
new_df = pd.merge(new_data_conf, new_data_reco, how='left', left_on=['Date','Country/Region'], right_on = ['Date','Country/Region'])final_df = pd.merge(new_df, new_data_death, how='left', left_on=['Date','Country/Region'], right_on = ['Date','Country/Region'])
We get our desirable data frame 💪
There’s one last step remained and that is drawing the actual Motion Chart. But before that, to get a better outcome, I would like to filter data based on countries with highest number of ‘Confirmed’ cases.
top_countries = final_df.loc[final_df['Confirmed_Cases'] > 15000]
top_countries_set = set(top_countries['Country/Region'])
The output list reflects the list of countries as of this date (Apr 15th).
3. select rows:
Having the list of selected countries into a list as a set()
, I used loc()
function to select required fields only and save them in top_countries_df
data frame.
top_countries_df = final_df.loc[final_df['Country/Region'].isin(top_countries_set)]
top_countries_df
Motion Chart
to draw the motion chart, we need to fill in the hyperparameters listed below and the rest is with the MotionChart
library that we import
from motionchart.motionchart import MotionChartmChart = MotionChart(df = top_countries_df, key='Date', x='Confirmed_Cases', y='Recovered_Cases', xscale='linear', yscale='linear', size='Deaths', color='Country/Region', category='Country/Region')
key
: ‘Date’ would be the main drive of my chart
x
axis:chose ‘confirmed’ cases
y
axis: chose ‘recovered’ cases
Bubble size
: chose the number of ‘death’s in each scenario
category
: The country list
These were my preference and anyone can, definitely, make any change based on their taste.
Publish
To publish the chart, it depends on your environment. As I was using a Jupyter Notebook, I used .to_notebook()
. To publish the same result on web, you need to use .to_browser()
.
mChart.to_notebook()
And here is final result:
Please comment below should you have any questions/feedback/comments.