Government AI Readiness Index 2023: visualizing governance insights using ggplot2

Kavengik
7 min readJun 5, 2024

--

In the previous article, we leveraged choropleths map in ggplot2 to visualize AI readiness scores using data from the 2023 Government AI readiness Index by Oxford Insights. In this article, we extend our focus to the governance pillar. We use visualizations in ggplot2 to draw insights on the following:

  1. Global governance scores with respect to AI readiness
  2. Global preparedness with respect to AI strategy
  3. Released AI strategies by income classification
Photo by Andrzej Kryszpiniuk on Unsplash

This article can also qualify as a data cleaning manual, given that significant time is devoted to data preprocessing. Preprocessing steps covered here include renaming columns, performing joins, creating new variables, filtering observations and summarizing data. As such we leverage common dplyr verbs and functions to do this: select, filter, mutate, summarize, left_join and rename. Now to the work!

Load libraries

We shall use five libraries in R for this tutorial: dplyr, ggplot2, countrycode, readxl and viridis.

library(dplyr)                               #data manipulation
library(ggplot2) #visualizations
library(countrycode) #getting country codes for country data
library(readxl) #reading excel data file into R
library(viridis) #use viridis color palette

Load data

We shall use two datasets for this tutorial: 1) 2023 Government AI Readiness Index data from Oxford Insights; 2) World Bank Income classification data. I worked from my Kaggle account; hence the loading syntax is reflective of that. To derive governance insights, we shall use data from the sheet labelled ‘Pillar and dimension scores’.

#1.Loading Government AI readiness Index data : Governance pillar
governance <- read_excel('/kaggle/input/government-ai-readiness-index/2023-Government-AI-Readiness-Index-Public-Indicator-Data.xlsx',sheet='Pillar & dimension scores',range = cell_rows(2:195))

#2.World bank income classification data
world_bank_df = read_excel('/kaggle/input/wb-income-classifcation/CLASS.xlsx')

Data Overview

  1. Governance data frame

An overview of the governance data frame, using the code block below, shows that there are 17 columns and 193 observations. Most columns are irrelevant to our objective, as seen from the column names, so we shall filter them out shortly, so as to only retain the relevant columns.

dim(governance)   #data frame dimension
str(governance) #data frame structure
Table 1: Governance data frame

2. World bank Income classification data frame

An overview of the World Bank country classification by income data frame, shows that there are 5 columns and 267 rows.

head(world_bank_df,n=3)  #first three observations
str(world_bank_df) #data frame structure
dim(governance) #data frame dimensions
Table 2: World Bank Income classification data frame

Data cleaning

  1. Governance data frame

Our interest when preprocessing the governance data frame is to achieve the following: 1) Retain governance related columns, 2) Drop empty columns and 3) Rename columns based on good naming convention.

# i. Retain relevant governance column i.e from Country to Adaptability
governance <- governance %>% select(Country:Adaptability)

# ii.Drop empty columns
governance <- governance %>% select(-c(3,7))

# iii.Confirm they are dropped
head(governance,n=2)

#v.Renaming columns to reflect propoer naming convention
governance <- governance %>%
rename('country'='Country',
'total_score'='Total score',
'government'='Government',
'technology_sector'='Technology Sector',
'data_infastructure'='Data and Infrastructure',
'vision'='Vision',
'governance_ethics'='Governance and Ethics',
'digital_capacity'='Digital Capacity',
'adaptability' ='Adaptability')

#v.Confrim names have changed
head(governance,n=2)
Table 3: Cleaned Governance data frame

ii. World Bank data frame

Our interest with the World Bank country income classification data frame is to rename the columns to reflect good naming convention.

#i.rename columns
world_bank_df <- world_bank_df %>%
rename('economy'='Economy',
'code'='Code',
'region'='Region',
'income_group'='Income group',
'lending_category'='Lending category')
#v.Confirm name change
head(world_bank_df)
Table 4: Cleaned World Bank country income classification data frame

Merging

We merge the two data frames as this will be useful to understand the objective of released AI strategies by income classification. As we do not yet have a common variable to merge on, we undertake the following:

1) We first create a country code variable in the governance data frame which we shall call ‘code’. We do this by creating a variable called code, using syntax from the countrycode library. This syntax will generate the ‘’alpha-3 code’ for each country on the governance list. Once we create the ‘code’ variable we shall have a common variable to merge on in both data frames.

2) We then merge the governance and world bank data frames, based on the ‘code’ variable, which is now common to both data frames. The merge shall be a Left Join as we are interested in retaining all governance data observations

#i.We create a variable called code, using syntax from the country code library
governance <- governance %>%
mutate(code=countryname(governance$country, destination='iso3c'))

#ii.Confirm creation of 'code' variable
head(governance,n=1)

#iii.Merge the two dataframes
governance<- left_join(governance,world_bank_df,by='code')

#IV.Make a copy of the governance dataframe before introducing geospatial dimension
governance_copy <-governance

#iv. Confirm merging is successful
head(governance,n=1)
dim(governance)

Exploratory Data Analysis

Objective 1: Visualizing global governance scores

We shall create a choropleth map, to visualize global governance scores. Before we proceed to create the map, we do the following steps:

  1. Load mapdata within R: mapdata is a data frame containing global geospatial data. This data is essential to creating a map.
  2. Then we create a ‘code’ variable in the mapdata data frame. Using this common variable, we shall then be able to merge the governance and mapdata data frames. Here we also use a left join, to retain all the observations in the governance data frame. After that we are ready to create the choropleth map. The syntax is as below.
#i. Load map_data dataframe
mapData <- map_data('world')

#ii. Create a country code variable in mapData data frame
mapData <- mapData %>% mutate(code=countryname(region,destination='iso3c'))

#iii. Merge with governance dataframe (left join)
governance <- governance %>%
left_join(mapData,by='code')
#iv. See dimension of governance datadrmae
dim(governance)
#setting figure dimensions
options(repr.plot.width = 15, repr.plot.height =8)

#choropleth map
ggplot(governance,aes(x=long,y=lat,fill=government,group=group))+
geom_polygon()+
labs(fill='Governance scores',
title= '2023 Government AI readiness Index: Governance Scores')+
scale_fill_gradient(low='yellow',high='red')+
theme_bw()+
theme(axis.text=element_blank(),
axis.ticks=element_blank(),
axis.title=element_blank(),
panel.grid=element_blank(),
panel.border=element_blank(),
plot.title=element_text(size=20,hjust=0.5))

Then we proceed to create the map with the syntax below.

#i.Choropleth map
ggplot(governance,aes(x=long,y=lat,fill=government,group=group))+
geom_polygon()+
labs(fill='Governance scores',
title= '2023 Government AI readiness Index: Governance Scores')+
scale_fill_gradient(low='yellow',high='red')+
theme_bw()+
theme(axis.text=element_blank(),
axis.ticks=element_blank(),
axis.title=element_blank(),
panel.grid=element_blank(),
panel.border=element_blank(),
plot.title=element_text(size=20,hjust=0.5))

Let us breakdown the syntax, layer by layer.

  1. ggplot : in this layer we specify the data frame (governance), the x and y columns i.e. (long -longitude, lat-latitude), the fill which in this case is the continuous variable ‘government’ that contains the governance scores and group which ensures boundaries are connected correctly which in our case is by country.
  2. geom_polygon : instantiates the plot which in this case is a choropleth.
  3. labs: here we specify the legend and plot title in the ‘fill’ and ‘title’ provisions.
  4. scale_fill_gradient : specifies the graduation of colors; ‘yellow’ for low scores and ‘reds’ for higher scores.
  5. theme — we indicate element_blank () for most of the provisions to remove the following: texts, ticks and titles (x and y axis), grids and borders; for the plot title we specify the text and size using element_text().
Figure 1: Visualizing governance scores

Objective 2: Global preparedness with respect to AI strategy

Here we want to understand the degree of completion with respect to AI strategy, which is contained in the ‘vision’ variable. To do this we cast the variable ‘vision’ as a factor, and then we rename the legend to capture the information below:

  1. No Strategy=0

2. Partial strategy =50

3.Complete strategy = 100

Except for the change of variable i.e. vision, and the introduction of scale_fill_discrete( to specify legend names), the rest of the syntax is the same for the choropleth map.

ggplot(governance,aes(x=long,y=lat,fill=factor(vision),group=group))+
geom_polygon()+
labs(fill='Vision completeness',
title='AI strategy : degree of completeness')+
theme_bw()+
theme(axis.text=element_blank(),
axis.ticks=element_blank(),
axis.title=element_blank(),
panel.grid=element_blank(),
panel.border=element_blank(),
plot.title=element_text(size=20,hjust=0.5),
legend.text=element_text(size=14),
legend.title=element_text(size=16))+
scale_fill_discrete(labels=c('No strategy','Partial strategy','Complete strategy'))
Figure 2 : Visualization of AI strategy completeness

Objective 3: Released AI strategies by income classification

We create a variable i.e. ‘strategy_summary’ that presents the count of countries with complete AI strategies, stratified by income classification. We do this by first selecting the variables ‘vision’ and ‘income_group’ from the governance_copy data frame. Using mutate, we create a new categorical variable ‘vision_cat’ in which observations with full strategies- signified by 100-, are assigned 1 and 0 otherwise. We retain countries with a complete AI strategy using filter. Using group_by we group observations by income classification, and then proceed to take the count in each group using summarise. The code block below summarizes all the aforementioned steps.

strategy_summary <-governance_copy %>%
select(vision,income_group)%>%
mutate(vision_cat=ifelse(vision==100,1,0)) %>%
filter(vision_cat==1) %>%
group_by(income_group) %>%
summarise(strategy_count=sum(vision_cat))
strategy_summary

Then we create a pie chart using the syntax below.

strategy_pie <- strategy_summary %>%
ggplot(aes(x='',y=strategy_count,fill=income_group))+
geom_col(color='white') +
coord_polar('y') +
geom_text(aes(label=strategy_count),
position=position_stack(vjust=0.5)) +
labs(fill='Income Group',
title='AI strategy by income classification')+
theme(plot.title=element_text(size=20,hjust=0.5),
panel.grid=element_blank(),
panel.background = element_blank(),
axis.title=element_blank(),
axis.text=element_blank(),
axis.ticks=element_blank())+
scale_fill_viridis(discrete=TRUE)
strategy_pie
Figure 3: AI strategy count by income classification

End!

Link : Kaggle notebook

--

--

Kavengik

Multi-hyphenate: Econometrics| Machine Learning| Unsupervised learning| Data analysis| Visual artist| Writer| cat mum Monday: Introspection Friday: Coding