The human factor: explaining the error in the 2014 and 2015 Corruption Perceptions Index results

Transparency Int’l
5 min readSep 13, 2017

--

By Santhosh Srinivasan

The Corruption Perceptions Index map from 2016

The Corruption Perceptions Index (CPI) is one of Transparency International’s signature measurement tools and arguably the most quoted corruption index. Policy-makers, businesses, journalists, academics and activists alike use the CPI, but it is no stranger to controversy and criticism. Those of us who have worked on the CPI know its strengths and weaknesses, what it can and can not do, and are always keen to think about how it could be better.

In fact, the methodology that underpins the CPI has been carefully and cautiously refined over the years with the help of an expert group composed of some of the leading thinkers in this field. Despite all the care taken in refining the CPI method, we discovered errors in the manual data aggregation of the 2014 and 2015 editions of the CPI. How did this happen?

The CPI is made up of a number (currently thirteen) of independent data sources. TI receives data directly from some of these data sources when the data is proprietary or not yet publicly released; or we access the data from the Internet where it is publicly available.

The first task of a CPI researcher is to create a single meta database containing the relevant parts that pertain to corruption from these data-sets for all countries covered. In the case of micro-level data (surveys of individual business people) this involves aggregating the data into macro country-level results. Due to the different formats of the underlying data-sets, ranging from Excel files, Word documents to jpg-files, there is a substantial amount of manual work involved.

The errors occurred during this step of creating a single data set. In 2015, it happened as the researcher copy and pasted the results by hand from the Bertelsmann Sustainable Governance Indicators (SGI) into the data-set. This was done using country names and scores rather than using a more sophisticated formula to match the data, based on unequivocal country identifiers, such as ISO codes.

In the meta data set the term “Korea, South” was used, but the original Bertelsmann SGI file used the term “South Korea”. The “copy and paste” approach did not detect this difference. This led to the Bertelsmann SGI scores for countries listed alphabetically between “Korea, South” and “South Korea” to be pasted incorrectly. As a result, all scores between Korea (South) and Slovenia shifted by one country (see table below). This changed the results for 11 countries.

In 2014, the copy-paste error happened when transferring the data from the Economist Intelligence Unit (EIU) for two countries. Since these countries were not scored by EIU in 2014, their 2013 scores were copy pasted into the meta data file. A similar copy paste to 2015 error occurred wherein the values were pasted into wrong cells.

This changed the index scores for three countries. In the case of Saint Vincent and Grenadines the score had to be revised from 67 to 62. In the case of Samoa, the country must now be excluded because it lacks three data sources; but in the case of Saint Lucia, the country can now be included because it has three data sources.

Given the extent of manual work involved in the CPI, we always conduct a “second pair of eyes” verification and check of the CPI results. Unfortunately, this check did not pick up on the 2014 and 2015 “copy and paste” errors. In addition, the external verification and check of the results also did not spot the errors, as it did not go back to the original data sets to check the data transfers. It only re-created the results based on the meta tables.

We are aware of the inadequacy of this verification procedure and have, starting with the CPI 2016, increased the quality control and verification systems around the production of the CPI.

What we do now, and what we will do going forward

First, we have revised the internal process to minimise occurrence of such errors. Two separate research staff members compute the CPI independently. Moving forward we will also improve the level of computerisation and automation in the handling of raw CPI data and compiling the meta tables.

Second, we have also enhanced the external verification processes. External reviewers do not simply verify the CPI calculations made by the TI-S research team as in previous years, but now also calculate the CPI independently. The results are then compared to ensure there is an exact match in scores and ranks.

As with all our tools and products, we at TI are committed to regularly reviewing and updating the research approaches and methodologies. When we did this review of the CPI, we also found that the CPI aggregation script could include further clarifications in how to interpret and apply it. Going forward we will include specific instructions in the script for how to deal with decimal points and when the researcher must round to a certain decimal place.

We have invested significant time and efforts in refining and improving the CPI methodology in terms of its transparency, comparability and ease of use over the years. Going forward we will make every effort to safeguard the integrity of the CPI methodology and compilation and to adapt to the evolving state-of-the-art research methods.

Please contact Santhosh Srinivasan (ssrinivasan@transparency.org) or Coralie Pring (cpring@transparency.org) if you have any further questions.

Annex: revised CPI 2014 & 2015

--

--

Transparency Int’l

Transparency International is the global coalition fighting against corruption. Follow us @anticorruption