Plotting the Pandemic with Python and Plotly
Create and deploy your own interactive dashboard to monitor the progress of the COVID-19 pandemic with Python, Plotly and Elastic Beanstalk.
Visualizations are a powerful tool for understanding and communicating the meaning within data. The current search for truth and meaning in the daily diet of data on the progress of the COVID-19 pandemic has inspired a number of awesome examples of data visualizations.
This blog details how you can build your own eye-catching interactive charts using the COVID-19 data, wrap them into a dashboard and deploy them to the internet, using open source software. An example can be found here. All of the code can be found on github.
The blog is structured into 5 sections, taking you through setup and configuration, to coding of one example visualization — an animated map of COVID-19 cases over time — to creating a dash-app ready for deployment:
- Setup for coding
- Reading-in and wrangling the data
- Creating interactive charts with plotly
- Creating a dashboard with dash
- Deploying your app to AWS Elastic Beanstalk
1. Setup for coding
Software
In the world of data science there are many different permutations of software and operating systems that can be used, depending on resources and taste. As a minimum you will need Python 3.5+ and git for version control. A code editor and a package manager are also recommended. I use the following:
- Python 3.7
- Package Manager: Anaconda 4.8.3
- Version control: git
- Editor: Spyder, Jupyter Lab and Notepad++
- Operating system: Windows 10
If you have a different setup, the instructions below won’t vary hugely, but bear in mind that some commands in the terminal may be different and you may need to install additional dependencies.
Version Control
It’s good practice to use version control and you’ll need it when you come to deploy. Assuming you’ve got git installed you can do this from the terminal.
Open a conda terminal (from the start menu in windows) and make a new directory. Note that if you have a Linux operating system you will need to use a forward slash, rather than a backslash.
mkdir .\covid-app
Go into that directory.
cd .\covid-app
Make a readme file (with a title “Covid-19 app”).
echo "# covid-19 app" >> readme.md
Start version control.
git init
Add the readme file into version control.
git add readme.md
Commit the file.
git commit -m "first-commit"
(Optional) If you have a github account you can now set up a repository there and link to it.
git remote add origin remote repository [repository url]
git remote -v git push origin master
Voilà. You are set up for version control.
Create a new environment
People forget to do this. Its important because you can end up in a mess when installing packages which conflict with each other. If you’ve done it inside an environment and things go wrong, its easy to close the environment and start again. If you haven’t you can end up with a configuration/uninstall nightmare to unpick.
In the conda terminal create a new environment with a name you’ll remember.
conda create --name [environment-name]
Then enter the environment (you’ll have to do this everytime you open a terminal).
conda activate [environment-name]
Configuration
Inside your environment you’ll need to install several additional packages
conda install jupyterlab plotly dash palettable
If you don’t install jupyterlab then you’ll need to seperately install pandas and numpy.
conda install pandas numpy
To style the app you’ll need to install (it’s not available through conda).
pip install dash_bootstrap_components
To get plotly charts to display in a jupyter notebook you’ll also need a few add-ons, follow the instructions in the plotly documentation
2. Reading-in and wrangling the data
In your covid-19 app folder, create a new .py or ipynb file, and import the data using pandas.
The data comes from the European Centre for Disease Prevention and Control, which (at the time of writing) releases daily updates on cases and deaths by country. As with all data you should understand what you are plotting, along with its caveats and limitations. You can read more on their website.
If you take a glance,
the data are generally very clean, so luckily there’s not much to do. However the ‘dateRep’ columns (which is the date), is not being recognised as a datetime object and is ordered from latest to earliest. If not corrected it will cause us problems later on.
Date Formats
To correct the problem:
- Convert the ‘dateRep’ column to date-time
- Re-sort the data to be in time order for each country
- Re-index the data to prevent errors when creating aggregates
Create a ‘global’/’world aggregate of data
We are also interested in a global total of cases and deaths by day, but given but you’ll see that it isn’t present in the dataset, so we need to create it.
To create the entry for the whole world:
- Create a table of figures aggregated across all countries.
- Add value for the new ‘World’ series for each of the columns in data.
- Add the values for ‘World’ into the original dataset.
Create cumulative totals for cases and deaths by country
The data set only contains daily numbers, we want to plot how cases and deaths accumulate over time.
Create variables to give a colour according to continent
First, read-in a file which matches countries to continents via their three letter ISO-3 codes (present in both datasets), then merge it into data. Finally create a colour dictionary to map continents to a colour, which will be used later on.
Create a list of dates as strings
For the slider having dates as strings, makes things easier than working with datetime format
3. Creating interactive charts with plotly
We’re now ready to start making some charts.
As with many packages there are multiple ways to do things in plotly, I am only going to show you one. The high level architecture of a plotly plot is shown below. Once you’ve imported the plotly.graph_objects library, you build a ‘figure’ dictionary which contains:
- A ‘data’ element. A list of dictionaries. Each dictionary in the list defines the type (e.g. scatter / bar) of chart and the data to be plotted. Where you want to show multiple series on the same chart (also referred to as traces), you can add further dictionaries to the list.
- (Optional) ‘layout’ parameters, to control virtually every aspect of the layout of the chart. You can read more in the plotly documentation
- (Optional) If you are adding any form of animation you will also need to pass a list of dictionaries into the ‘frames’ element which defines what vie chart should be plotted at each frame of the animation.
You then call go.Figure()
to create the chart.
It should look something like this:
Now we are ready to build something a bit more complicated, which looks like this, with a slider showing snapshots through time, and a ‘play’ button which starts an animation showing cases over time.
Building an animated map of total COVID-19 cases
First, initialize the figure dictionary.
‘data’
To create the ‘data’ element select the day you want to be the default view when the chart is loaded — I’ve set it to the most recent day. Then subset the data set to only include data from that day.
As we want each continent to have its own series (colour) on the map we will loop through the continents creating a separate trace for each which will all appear together on map. In each loop, a colour is assigned to the continent based on the ‘colours’ dictionary created above. The data are then further subsetted by continent name (‘cont’), before the trace dictionary is created. At the end of each loop the data_dict created for each continent is appended into the data element of the figure dictionary.
As we want to animate this chart over time, we need to define frames. These will be the same chart defined above for the ‘data’ element of the figure, but re-plotted with data for each date in the data set. We will do this by taking the same code as above, but adding an additional loop in order to loop through days as well.
At the same time we will need to capture a list of steps. Steps are layout parameters, which link the frames to the days on the slider and define how you transition between frames.
Before we enter the loop we create empty lists to capture frames and steps. Inside the outer loop we create a ‘frame’ dictionary that will gather all of the traces for each day. At the end of the inner loop this is appended to the ‘frames’ element of the figure dictionary Each time the outer loop starts again with a new day, the frame dictionary is re-created.
In each outer loop (representing one day) a ‘step’ dictionary is also defined. This is then appended to the ‘steps’ list at the end of each loop.
Now we’ve got the steps we need to add a slider. This is done by creating a dictionary inside a list, which is an argument of the ‘layout’ dictionary for the whole figure. The ‘steps’ list is added as an argument to the sliders dictionary.
Finally we need to define the layout.
Finally we can pull the whole figure together and store as ‘map1’
The whole code for the map should look like this:
So far we’ve read-in the data, added some additional features to it and created an interactive and animated chart. But it’s all still running only in our code editor/terminal. Now we need to prepare for deployment.
4. Creating a dashboard with dash
Once you’ve created one or more figures, you can use dash to pull them together into a dashboard. Dash allows you to create a relatively straightforward way of doing this built by plotly.
After importing the needed packages. You can create your first browser based app with just a few lines of code.
You should see the following message after running the code above. This means your dashboard is up and running locally. If you paste http://127.0.0.1:8050/
into your browser you should see the map appear and be fully interactive.
Creating more complex dashboards
You can add more charts to your dashboard, by including further ‘html.Div’ terms.
You can also style your dashboard, by including css styling parameters as shown above in the “style” dictionaries. You can also import ready made style sheets by updating the initial app definition.
More complex styling of your app will lead you into HTML and CSS; if you’re not familiar with these, there’s tons of good explanations out there such as W3 school and its pretty easy to add elements incrementally in the dash framework.
5. Deploying your app to AWS Elastic Beanstalk
Now that you’ve created your app you want other people to be able to see the fruits of your creative genius. You want to make it available via the internet. Again there are many routes to do this. I chose elastic beanstalk because of its flexibility and scaleability. With a simple dashboard (like this one) you can quite happily deploy to heroku or Amazon S3. However, if you are thinking of building something more complex that takes user input (and so requires call backs) then elastic beanstalk is going to be better; particularly if you’re expecting lots of traffic.
Before deployment you will need to create an ‘app’ subfolder in the covid19-app directory folder which contains the following:
- application.py — containing the dash app code. Note this must be called ‘application.py’ , not app.py or anything else.
- figures.py — containing the code for the charts. The code for creating charts can be put at the top of the applications file, but I find it neater to seperate them out and it makes debugging easier.
- requirements.txt — contains the python packages that will need to be installed on the elastic beanstalk instance
- assets/ — A folder containing any images, css files and supplementary data.
Update the application.py file
The application code needs to look a bit different for deployment.
Note the addition of application = app.server
and the changes to the last two lines.
If you have seperated out your charts into figures.py you will also need to import them into the application.py file.
Create a requirements.txt file
The file should look like this:
Make an AWS account
If you don’t have an AWS account, then you need to make one at https://aws.amazon.com/. I suggest following the instructions in this blog, from step 27 onwards.
Install the AWS client
Next you will need to download the eb client. From your terminal, run:
pip install awsebcli --upgrade --user
Make sure you are in your covid19-app/app folder, then initialize an elastic beanstalk environment.
eb init
When prompted choose a region near you. Accept the defaults, including the python version — Even though I run python 3.7 locally, I needed to choose 3.6 for Elastic Beanstalk. When asked it you want to set up ssh select ‘no’. When it asks if you want to use ‘CodeCommit’ also select ‘no’.
Next ensure all folders and files in the app folder are committed.
git add .
git commit -m "commit in preparation from deploy"
Create an instance of your app in the environment
eb create
You can accept the defaults, and then wait a while for the app to create. Once it is done, you should see a ‘CNAME’ link.Paste that link into your browser, and you should see your app appear.
CNAME:http://covid-app-dev.eu-west-2.elasticbeanstalk.com/
If you make changes to the app code, simply git add, git commit and run
eb deploy
And your app will update.
If you see an error rather than your app, you can check the logs.
eb logs
The logs will point to errors in the code (w often missing commas or unmatched parentheses!) and help you debug.
That’s it, you are up and running. Now there’s time to spend endless hours styling and creating new charts.
References:
* https://medium.com/@miloharper/a-beginner-s-guide-to-creating-your-first-python-website-using-flask-aws-ec2-elastic-beanstalk-6a82b9be25e0
* https://medium.com/@austinlasseter/plotly-dash-and-the-elastic-beanstalk-command-line-89fb6b67bb79