COVID19 Graph Using Ggplot2:

--

Many data scientists use R for its outstanding feature of statistical analysis and visualization. Ggplot library is a part of the R program and has a reputation as a powerful visualization tool. In the last few weeks, I have seen different types of graphs related to COVID19 trends in the world. I decided to explore what graphs I could create with available data on COVID19 using ggplot.

Here, I am sharing steps I took to create my graphs. If you have a general understanding of R and ggplot grammar you should not have any challenges reproducing the same graphs. Or, if you have never heard about R, this post can be still useful to understand how the visualization process works in R. Please note that data on COVID19 changes rapidly; my charts below illustrate the trends as of April 6th, 2020.

Let’s get started!

First, I downloaded a file (cvs) from COVID-19 Data Center of John Hopkins University that tracks cases and trends on it. Then, I uploaded data to RStudio environment. The dataset had data on country, date, cases, new case, deaths and recovered. I screened data for potential problems and noticed that China was given the name of “Mainland China” and Iran was called “Iran (Islamic Republic of). I decided to rename these countries to China and Iran by using tidyverse. Additionally, I loaded lubridate package to format the date column in the dataset.

covid<-read.csv("cov.csv")
covid
library(tidyverse)
library(lubridate)
### This code will rename variables and select only 10 countries
covid_edit<-covid %>%
mutate(country=recode(country,"Mainland China" ="China",
"Iran (Islamic Republic of)" = "Iran")) %>%
filter(country == "USA" |
country== "China" |
country == "Italy" |
country == "Germany"|
country == "Spain" |
country == "France" |
country == "UK"|
country == "Iran" |
country == "Netherlands"|
country == "Belgium")
### Using lubridate package format the date column covid_edit$date<-mdy(covid_edit$date)### I am ready now to plot my first graphcovid_10<-ggplot(data=covid_edit, aes(x=date, y=cases, col=country))+
geom_line(size=1.6)+ggtitle("COVID19:Commulative Confimed Cases in 10 Top Countries")+
xlab("")+ylab("")+theme_minimal()
covid_10

This is the output of my code:

I liked the graph; however, I would like to make a few final touches. I wanted to remove the legend title and move the legend to the bottom. This is how I did it.

### remove legend name and move legend to the bottomcovid_final<-covid_10+theme(legend.position = "bottom", 
legend.text = element_text(size=14, face="bold"),
plot.title = element_text(size=26),
legend.title = element_blank(),
axis.text.x = element_text(size=16),
axis.text.y = element_text(size=9))
covid_final

After making changes, I got the following output.

Now, I wanted to highlight countries that have more than 100,000 cases and display their names. I used the same coding, but also loaded an additional package, called gghighlight. See below my code, and the output produced.

library(gghighlight)covid_highlight<-ggplot(data=covid_edit, aes(x=date, y=cases, col=country))+
geom_line(size=1.6)+ggtitle("COVID19:Countries with more than 100,000 cases")+
gghighlight(max(cases) > 100000,
label_key = country, label_params = list(size=6))+
xlab("")+ylab("")+theme_minimal()+
theme(plot.title = element_text(size=26),
axis.text.x = element_text(size=16),
axis.text.y = element_text(size=12))
covid_highlight

I got now the names of countries in my graph.

Over the last two weeks, much attention was given to USA, Italy, Spain, and Germany. My original dataset has data on new cases, recovered, and deaths. I wanted to plot these variables in one graph. First, I chose four countries, and as for time period I decided to look at the month of March.

### I am going to use file that I already created and called covid_editcovid_4<- covid_edit %>% 
filter(country == "USA" |
country == "Italy" |
country == "Germany"|
country == "Spain")
covid_4_final<-ggplot(data=covid_4)+
geom_line(aes(x=date,y=new_cases, colour="new_cases"), size=2)+
geom_line(aes(x=date, y=recovered, colour="recovered"), size=2)+
geom_line(aes(x=date, y=deaths, colour="deaths"), size=2)+
scale_color_manual(name="covid", values=c("black", "blue", "red"),
labels=c("Deaths", "New cases", "Recovered"))+
scale_x_date(limits = as.Date(c("2020-03-06", "2020-04-06")))+
facet_wrap(country~.)+theme_minimal(base_size=20)+
ylab("")+xlab("")+
scale_y_continuous(position = "right")+
ggtitle("COVID19: 4 TOP COUNTRIES")+theme(plot.title = element_text(hjust=0.5))

I decided to move my y-axis to the right side, and also move legend to the bottom.

covid_4_final+theme(
plot.title = element_text(face = "bold", size = 26),
legend.background = element_rect(fill = "white", size = 10, colour = "white"),
legend.position = "bottom",
axis.ticks = element_line(colour = "grey70", size = 0.2),
panel.grid.major = element_line(colour = "grey70", size = 0.2),
panel.grid.minor = element_blank(),
axis.text.x = element_text(size=16),
axis.text.y = element_text(size=14))

You can further experiment with data on COVID19 and create your own graphs by replacing geom_line with geom_bar or geom_area. I am sure you will be able to create awesome visualizations. Thank you for reading this. I hope you find it useful.

--

--