Creating animated bar chart in Stata

Mike Smet
The Stata Gallery
Published in
6 min readJun 22, 2022

In this guide I will show how to create the following animated bar chart, which is a time-lapse of the share of hospital ICU beds occupied by COVID-19 patients in Belgian provinces:

Before starting, it is always a good idea to set up a decent folder structure and to store data, graphs and other objects in different folders: see The Stata workflow guide for an excellent overview.

In order not to distract from the main aim of creating an animated graph, I will simply store all files in one folder: the hypothetical folder called ‘COVID19animation’ which is simply located in the root of my C-drive. So first I set the working directory to this folder ‘COVID19animation’: in this case it will look like the command below (you can change it into the combination of folders and subfolders where you want to locate the files).

cd "C:/COVID19animation"

Import data and basic data manipulation

This example uses data on Covid-patients in Belgian hospitals. We first import the data, change some of the labels and create a new variable called ‘Share_ICU_beds’ that allows to calculate the share of ICU beds occupied by Covid-patients (using the total number of ICU beds in every province, e.g. 301 in Antwerpen, 23 in Brabant Wallon, etc.).

import delimited "https://epistat.sciensano.be/Data/COVID19BE_HOSP.csv", encoding("utf-8")

encode province, generate(province2)
label define province2 2 "Brabant Wallon", modify
label define province2 9 "Oost-Vlaanderen", modify
label define province2 10 "Vlaams-Brabant", modify
label define province2 11 "West-Vlaanderen", modify
gen Share_ICU_beds=0
replace Share_ICU_beds = total_in_icu/301 if province=="Antwerpen"
replace Share_ICU_beds = total_in_icu/23 if province=="BrabantWallon"
replace Share_ICU_beds = total_in_icu/278 if province=="Brussels"
replace Share_ICU_beds = total_in_icu/259 if province=="Hainaut"
replace Share_ICU_beds = total_in_icu/230 if province=="Liège"
replace Share_ICU_beds = total_in_icu/145 if province=="Limburg"
replace Share_ICU_beds = total_in_icu/43 if province=="Luxembourg"
replace Share_ICU_beds = total_in_icu/97 if province=="Namur"
replace Share_ICU_beds = total_in_icu/265 if province=="OostVlaanderen"
replace Share_ICU_beds = total_in_icu/139 if province=="VlaamsBrabant"
replace Share_ICU_beds = total_in_icu/221 if province=="WestVlaanderen"
replace Share_ICU_beds = Share_ICU_beds*100
gen max_share_ICU_beds=0 // To keep track of the maximum share of occupied beds up to a given day

Next, we transform the original date variable (which is in string format) and create some locals to store the current date, the start date from which we want to create graphs and the number of days between start date and end date: 15 March 2020 is the first day of availability of the data, but if you want to update the animated graph, you may want to fill in a more recent date in the local date, e.g. local date = td(15jun2022) (this avoids having to recreate all the graphs from 15 March 2020 if you just want to add a limited number of new graphs).

rename date date_string
gen date=date(date_string,"YMD")
save "Hospital data.dta", replace
local today: display date(c(current_date),"DMY")
local date = td(15mar2020) // Create new graphs from this date
local number_of_days =`today'-`date'

Create the graphs

Next we create a new graph for every single day from the start of data availability until today. We use a forvalues loop and combine a horizontal bar chart with a scatterplot. The former creates the horizontal bars, the latter overlays this bar chart with a pipe symbol (|) indicating the maximum occupancy rate that was reached from the start of data availability up to that particular day: in the animation you will see that on some days the | is pushed further right, indicating that a new record occupancy rate was reached that day. Finally, every single graph is exported and assigned a name indicating the chronological order.

forvalues i=1/`number_of_days' {
local date2 = `date'
local date2: display %tdDD/NN/CCYY `date2'

bysort province: egen test=max(Share_ICU_beds) if date<=`date'
replace max_share_ICU_beds=test
drop test

twoway (bar Share_ICU_beds province2 , horizontal barwidth(0.5) ) ///
(scatter province2 max_share_ICU_beds, msymbol(pipe)) ///
if date==`date' ///
, xscale(range(0 100)) yscale(reverse) ytitle("") ylabel(#10, labels angle(horizontal) format(%9.0g) ///
valuelabel noticks nogrid) xlabel(25 "25%" 50 "50%" 75 "75%" 100 "100%") ///
title("Percent of ICU occupied by Covid patients on `date2'", justification(left) span) ///
xline(25 50 75 100) ///
note("Data: Sciensano" "Note: ICU capacity refers to regular number of ICU beds") legend(off) graphregion(color(white))

graph export "Fig`date'.png", replace
local date = `date'+1
}

These command should generate a bunch of graphs looking like this one: the bars show the share on 14/06/2022 and the | symbol indicates the maximum that was reached in the time span ranging from 15/03/2020 up to 14/06/2022.

In addition, I want to add the occupancy rate (i.e. the actual percentage) next to the bar of the last day on the graph, and I want these numbers to remain visible for a short amount of time in the animation. So I add the appropriate labels and make 15 copies of this last graph.

local date=`date'-1twoway  (bar Share_ICU_beds province2 , horizontal barwidth(0.5) ) ///
(scatter province2 max_share_ICU_beds, msymbol(pipe)) ///
(scatter province2 Share_ICU_beds, msymbol(none) mlabel(Share_ICU_beds) mlabformat(%2.0f) xtitle("")) ///
if date==`date' ///
, xscale(range(0 100)) yscale(reverse) ytitle("") ylabel(#10, labels angle(horizontal) format(%9.0g) ///
valuelabel noticks nogrid) xlabel(25 "25%" 50 "50%" 75 "75%" 100 "100%") ///
title("Percent of ICU occupied by Covid patients on `date2'", justification(left) span) ///
xline(25 50 75 100) ///
note("Data: Sciensano" "Note: ICU capacity refers to regular number of ICU beds") legend(off) graphregion(color(white))

forvalues i=1/15 {
graph export "Fig`date'.png", replace
local date = `date'+1
}

This command generates 15 identical graphs

Create animated version

All previous commands generate a couple of hundred separate graphs, which we need to stitch together. I use ffmpeg, which can be executed within your Stata do-file. Detailed description on how to set up and use ffmpeg can (for example) be found on the following pages (respectively written by Jesse Wursten, Asjad Naqvi and Chuck Huber)

ffmpeg requires that you specify the folder where ffmpeg is located as well as the location of the source graphs and the final name and destination of your animated graph. The exact locations will of course depend on your specific folder structure, so please replace the ‘C:/COVID19animation’ below with the appropriate folder(s) on your system.

shell "C:/Program Files/ffmpeg/bin/ffmpeg.exe" -y -framerate 15 -start_number 21989 -i "C:/COVID19animation/Fig%02d.png" -c:v libx264 -crf 12 -r 15   "C:/COVID19animation/ICU occupancy rates.mp4"

For some reason (yet unknown to me) the resulting mp4 file cannot be played in my default windows video player. However, it does play in the VLC media player and when you upload it to YouTube. The final result would be this:

The mp4 file generated by ffmpeg also fails to play on Twitter, therefore, as a final step, I transform it using the default setting using handbrake (https://handbrake.fr/) and then it plays in the default windows video player and also in Twitter.

That’s it for this guide. I hope you enjoyed reading it.

Finally, I’m thankful to Jesse Wursten and Asjad Naqvi for the valuable tips they provided for polishing and finetuning the graphs.

About the author

I’m an economist from KU Leuven teaching microeconomics and econometrics, but interested in a broad range of topics and domains. Occasionally posting something on Twitter.

--

--