What’s going on with corruption and inequity?

4 min readFeb 27, 2019

I’m just back from winter break down in Peru. It was a great time with the family and that strange feeling (for me during this time) to be at home. By the way, it was a politically convulsed moment during these holidays. The ‘magnificent’ politicians tried to demolish the small efforts to combat the Odebrecht corruption scandal that is now taking its eruption turn in Peru.

So I thought it was a worthy example to explore this kind of phenomena and it’s ‘price’. This example comes with ABSOLUTELY NO WARRANTY regarding critical thinking about the mechanistic processes driven the corruption, inequality or its relation. Just a good way to do some data exploration.

These are the R packages that we’re going to use in this post

install.packages(c("maptools", "rgeos","tidyverse", "gpclib", 
                   "mapproj","readxl","ggsci","viridis",
                   "ggridges","gridExtra", "scico", "sf",
                   "gganimate", "transformr","gifski","png",
                   "devtools"), 
                 type="source")

DATA

We’re going to use the Corruption Perception Index 2017 from Transparency International’s (TI). The TI data repository is here. Also, data on the most recent GINI index per country were obtained from World Bank. This is not a perfect measure of Inequity, but we are going to use it in this example as a proxy. The file could be found here.

# Load vector data (maps)
library(maptools)
library(rgeos)
data(wrld_simpl)# Remove Antarctica and Greenland
wrld_simpl <- wrld_simpl[wrld_simpl$ISO3 != "ATA",]
wrld_simpl <- wrld_simpl[wrld_simpl$ISO3 != "GRL",]# Transform vector data into dataframe
library(tidyverse)
library(gpclib)
gpclibPermit()
world.df = fortify(wrld_simpl, region="ISO3")# Data wrangling
WB_data<-read.csv("~/API_SI.POV.GINI_DS2_en_csv_v2_10224868.csv", skip = 3) %>%
  gather(Year, Gini, X1960:X2017) %>%
  mutate(Year=as.numeric(substr(Year,2,5))) %>%
  filter(!is.na(Gini)) %>%
  group_by(Country.Code) %>%
  top_n(wt = Year,n=1) %>%
  dplyr::select(Country.Code, Year, Gini) %>%
  rename(id=Country.Code)library(readxl)
TI_dat <- read_excel("~/CPI2017_FullDataSet.xlsx", skip = 2) %>%
  slice(1:180) %>%
  dplyr::select(Country:`CPI Score 2017`) %>%
  rename(id=ISO3, CPI=`CPI Score 2017`)dat <- TI_dat %>%
  full_join(WB_data, by="id") %>%
  mutate(Reg_COD=as.numeric(as.factor(Region)))# Merge with spatial data
merge.world <- merge(world.df, dat, by="id", all=T)
final.plot<-merge.world[order(merge.world$order), ]

VISUALS

We’re going to use some visual adjustments described in this post. Additionally, I found a good compilation of palettes by Emil Hvitfeldt here. For these maps, we’re using the scicolibrary.

library(scico)
final.plot %>% 
  ggplot(aes(x = long, y = lat, group = group, fill=CPI)) +
  geom_polygon(color = "black", size = 0.25) + 
  #labs(fill = var1) +
  coord_map() +
  scale_fill_scico(palette = 'lapaz') +
  theme_bw()

final.plot %>% 
  ggplot(aes(x = long, y = lat, group = group, fill=Gini)) +
  geom_polygon(color = "black", size = 0.25) + 
  #labs(fill = var1) +
  coord_map() +
  scale_fill_scico(palette = 'lapaz', direction=-1) +
  theme_bw()

TRENDS

To clarify the trends in the maps we’re going to plot both indexes and explore what are the trends per Region. We’re going to use gganimate library by Thomas Pedersen to show these trends dynamically.

library(gganimate)
library(transformr)
library(ggsci)
p<-dat %>% 
  filter(!is.na(Region)) %>%
  ggplot(aes(x = CPI, y = Gini, col=Region)) +
  geom_point() +
  geom_smooth(aes(fill = Region), method = lm, alpha=.2) +
  scale_x_reverse() +
  guides(fill=FALSE) +
  xlab(expression(
    paste("CPI (" %->% " More Corrupt )"))) +
  ylab(expression(
    paste("Gini Index (" %->% " More Inequitative )"))) +
  scale_color_lancet() +
  scale_fill_lancet() +
  theme_bw() +
# Here the gganimate-specific codes
  transition_time(Reg_COD) +
  ease_aes('cubic-in-out') +
  shadow_mark(past = T, future = F)animate(p, 100, 10)
anim_save("~/Fig1.gif", animation = p)

SUMMARIZE

It’s fascinating how both concepts behave contrastingly in different geographic regions. As I mentioned at the beginning of this post I have no intention to explain the underlying mechanics but it is a nice data exploration exercise. Here the code to just to put the data in a single panel.

a<-final.plot %>% 
  filter(!is.na(Region)) %>%
  ggplot(aes(x = long, y = lat, group = group, fill=Region)) +
  geom_polygon(color = "black", size = 0.25) + 
  coord_map() +
  scale_fill_lancet() +
  theme_bw()b<-dat %>% 
  filter(!is.na(Region)) %>%
  ggplot(aes(x = CPI, y = Gini, col=Region)) +
  geom_point() +
  geom_smooth(aes(fill = Region), method = lm, alpha=.2) +
  scale_x_reverse() +
  guides(fill=FALSE) +
  xlab(expression(
    paste("CPI (" %->% " More Corrupt )"))) +
  ylab(expression(
    paste("Gini Index (" %->% " More Inequitative )"))) +
  scale_color_lancet() +
  scale_fill_lancet() +
  theme_bw() +
  facet_wrap(.~Region, ncol=3, scales = "free")library(gridExtra)
(fig<-grid.arrange(a,b, nrow = 2))
ggsave("~/Fig2.png", fig, device="png", dpi = 300)

Happy learning and data exploration!! You can find other stuff or simply say hi on twitter @gabc91.

What’s going on with corruption and inequity?

DATA

VISUALS

TRENDS

SUMMARIZE

Written by Gabriel Carrasco Escobar