Srijit Mukherjee
Published in

Srijit Mukherjee

COVID19’s effect on airport traffic: A Data Analysis in R

Download the data from here.

Load your data (csv format)

data = read.csv("location/filename.csv")

Exploration of Data

Observing the data


Variable Types


Understanding the Data

Important Variables

Numerical Variables

  • PercentOfBaseline

Categorical Variables

  • Date
  • AirportName
  • Centroid (of airports)
  • City
  • State
  • Country
  • Geography (of countries)


  • This is time-series data.
  • The centroid and geography will be only helpful if we want to visualize the data on the world map.
  • Country → State → City → AirportName are factor variables inside the next one.


  • Data Visualization is the main focus of the project.
  • Due to the structure of Country → State → City → AirportName, the best way to visualize it in an app with dropdown options. (R Shiny App, Tableau, Power BI, etc.)
  • Analyze each of the individual time series and forecast for the next “n” number of days.
  • Data visualization of categorical variables and their relationship is not important, because they are interrelated and give no valuable insight.

Exploratory Data Analysis

Univariate (Kernel Density of PercentOfBaseline)

ggplot(data, aes(x = PercentOfBaseline, fill = "red"))+geom_density(alpha = 0.2)

Exercise: Plot individual kernel density of each factor of Country, State, City, and AirportName.


Numerical vs Categorical

- PercentofBaseline vs Country

ggplot(data, aes(x = PercentOfBaseline, fill = Country))+geom_density(alpha = 0.2)


  • All the countries are behaving like normal mixtures with multiple modes.
  • Australia has two distinct modes of normal mixtures. (Why?)
  • Canada has a higher PercentOfBaseline than most of the other countries.

Exercise: Plot and compare kernel density of PercentOfBaseline of factors of Country, State, City, and AirportName.

- PercentofBaseline vs Date (Time Series)

ggplot(data, aes(x =Date, y = PercentOfBaseline, group = Country, colour = Country)) + geom_line() +facet_wrap(~ Country)


  • Except for Australia, the other countries have an increasing trend.
  • Australia has two distinct modes of normal mixtures because, over time, the PercentOfBaseline has a major shift.

Exercise: Plot and compare the time series of PercentofBaseline of Country, State, City, and AirportName.

Time Series Forecast

Time Series is a specific data structure in R.

We have to convert data into time series data structures to apply time series algorithms.

Chile Data

data_chile = data %>% filter(Country == "Chile")
chile = data_chile[,c(2,5)]

Convert into time-series data structure*

  • 1,2,3,4,… as time index with no NA values.
z <- read.zoo(chile, format = "%Y-%m-%d")#zoo series for dates
time(z) <- seq_along(time(z))#sequential data as 1,2,3,4,...
ts_chile = as.ts(z) #conversion into time series data structure
  • Date as a time index with NA values to be removed.
library(tseries)#for removing na from ts()
z <- read.zoo(chile, format = "%Y-%m-%d")#zoo series for dates
ts_chile = as.ts(z)
ts_chile = na.remove(ts_chile)#remove na from time series

If you convert here, like this it will create weird numbers. I prefer to do it as 1,2,3,4,… time index.

Plot the time series


Two parts of the Time Series, we’ll discuss are

  • Decomposition (Trend + Seasonal + Error)
  • Forecasting



This is happening because the data has no seasonal trend


Model (ARIMA model)

model = auto.arima(ts_chile)

ACF Plot

acf(model$residuals, main = "Correlogram")


pacf(model$residuals, main = "Partial Correlogram")

Ljung Box Test

The test determines whether or not errors are iid (i.e. white noise) or whether there is something more behind them; whether or not the autocorrelations for the errors or residuals are non zero.

Box.test(model$residuals, type = "Ljung-Box")

We do not reject the hypothesis, that the model shows a good fit.

Normality of Residuals

hist(model$residuals, freq = FALSE)


forecast = forecast(model,4)#4 = number of units you want to 
autoplot(forecast) #plot of the model
accuracy(forecast) #performance of the model
forecast #forecast values

Exercise: Apply it to other countries as well.

All the best. :D



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store