HNG STAGE ONE TASK: Outlier Detection in Election Data Using Geospatial Analysis

Chika Chukwu
6 min readJul 3, 2024

HNG-INTERNSHIP, HNG-PREMIUM

INTRODUCTION

In the recently concluded election, the Independent National Electoral Commission (INEC) faced numerous legal challenges regarding the integrity and accuracy of the results. Allegations of vote manipulation and irregularities have been widespread, necessitating a comprehensive investigation.

OBJECTIVE

The objective of this analysis is to:

  1. Prepare the dataset by adding geospatial data (latitude and longitude).

2. Identify neighboring polling units based on geographical proximity.

3. Calculate outlier scores for each party in each polling unit.

4. Sort the dataset by outlier scores and provide a detailed report highlighting significant outliers.

The mission is to uncover potential voting irregularities and ensure the transparency of the election results by identifying polling units where voting results significantly deviate from neighboring units, potentially indicating influences or rigging.

ANALYTICAL FOCUS

The analysis focuses on identifying outlier polling units based on each party’s votes. Geospatial techniques will be employed to find neighboring polling units and calculate an outlier score for each party in each unit.

GOAL

The primary goal is to pinpoint polling units where the voting results deviate significantly from their neighbors. This deviation may indicate potential irregularities or influences, helping to ensure the integrity and transparency of the election results.

ANALYSIS PROCESS

STEP 1: DATA GATHERING AND PREPROCESSING

HNG provided the dataset for Zamfara State, containing information on polling units and the votes received and cross-checked by each party. For this analysis, I used Python and Microsoft Excel. First, I imported the necessary libraries in Python and loaded the data into Pandas.

DATA QUALITY ASSESSMENT

I conducted a data quality assessment, which included:

  • Verifying the column names and data types.
  • Reviewing the descriptive statistics.
  • Checking for null values — none were found in the data.
  • Noting the data shape, which consists of “70 rows and 19 columns”.
  • Checking for inconsistent characters and found some (e.g., commas, dashes), which was necessary for some columns like the LGA, ward, PU-Code, PU-Name, and Result File.
  • Checking for data irregularities, odd entries, and duplicates — none were found in the data.

STEP 2: ADDING GEOSPATIAL DATA

The initial dataset lacked the longitude and latitude values for each polling unit. I used the ArcGIS Geocoding service from the geopy library to obtain these coordinates and added them to the dataset as two new columns Longitude and Latitude. After verifying the data, no null values were found, and the data size was updated to “1470” and the data shape to “70 rows and 21 columns”. I saved my cleaned dataset into a new CSV file “ZAMFARA_Cleaned.csv”.

STEP 3: NEIGHBOR IDENTIFICATION (METHODOLOGY)

I determined neighboring polling units by calculating geodesic distances using the geopy library. This involved measuring the distance from each polling unit to every other unit. Units within a 1 km radius were classified as neighbors. After applying the necessary functions, I saved the dataset with this neighbor information included into a new CSV file “ZAMFARA_csv_neighbours”

STEP 4: OUTLIERS SCORE CALCULATION (METHODOLOGY)

For each polling unit, I analyzed the votes received by each party and compared them with those of neighboring units. The outlier score for each party was computed as the absolute deviation of its votes from those of neighboring units.

REPORT SUMMARY

APPROACH

To identify the most notable outliers, I sorted the dataset by outlier scores for each party using Microsoft Excel. This section summarizes our methodology and highlights the top 3 outliers identified for each party.

METHODOLOGY

This analysis utilized geospatial techniques to identify neighboring polling units within a 1 km radius. Outlier scores were calculated based on deviations in each party’s vote count compared to its neighbors. I identified notable outliers and compiled them for further examination, saving my results in an Excel file named “ZAMFARA_State_outliers_scores”.

FINDINGS

APC TOP 3 OUTLIERS:

  1. BAGEGA II/MAKARANTA (PU-Code: 36–01–01–002)
  • Location: Bagega, Anka, Zamfara
  • Votes: APC received 235 votes.
  • APC Outlier Score: 116.5
  • Details: This polling unit had 141 accredited voters out of 750 registered voters. The significant deviation in APC votes compared to its neighbors, who collectively contributed to the high outlier score, suggests potential voting irregularities at this location.

2. SHIYAR TUDU/PRIMARY SCHOOL (PU-Code: 36–02–02–001)

  • Location: Birnin Tudu, Bakura, Zamfara
  • Votes: APC received 2 votes.
  • APC Outlier Score: 116.5
  • Details: This polling unit had 355 accredited voters out of 832 registered voters. The very low number of votes for APC, combined with the high outlier score, indicates a significant deviation from neighboring units, potentially highlighting irregularities.

3. S/AJIYA II/VILLAGE HEAD OFFICE (PU-Code: 36–02–02–012)

  • Location: Birnin Tudu, Bakura, Zamfara
  • Votes: APC received 179 votes.
  • APC Outlier Score: 99.5
  • Details: This polling unit had 261 accredited voters out of 803 registered voters. The considerable deviation in APC votes compared to its neighbors, resulting in a high outlier score, suggests potential voting irregularities that need further investigation.

LP TOP 3 OUTLIERS:

  1. DAN MANAU / MODEL PRI SCH. (PU-Code: 36–02–05–005)
  • Location: Dan Manau, Bakura, Zamfara
  • Votes: LP received 3 votes.
  • LP Outlier Score: 1.5
  • Details: This polling unit had 220 accredited voters out of 989 registered voters. The outlier score for LP indicates a small deviation from neighboring units, but it is noteworthy due to the relatively low number of votes received by LP.

2. SHAMUSHALE I / SHIYAR SABON GARI (PU-Code: 36–03–02–011)

  • Location: Shamushele I, Birnin Magaji, Zamfara
  • Votes: LP received 0 votes.
  • LP Outlier Score: 1.5
  • Details: This polling unit had 100 accredited voters out of 750 registered voters. The absence of votes for LP, combined with the outlier score, suggests a slight deviation from the expected voting pattern that may need further examination.

3. BUKKUYUM I / PRIMARY SCHOOL (PU-Code: 36–04–02–004)

  • Location: Bukkuyum, Bukkuyum, Zamfara
  • Votes: LP received 2 votes.
  • LP Outlier Score: 1.5
  • Details: This polling unit had 233 accredited voters out of 750 registered voters. The minimal deviation in LP votes compared to neighboring units resulted in the outlier score, indicating potential irregularities that warrant further investigation.

PDP TOP 3 OUTLIERS:

1. ZAUMA / PRIMARY SCHOOL (PU-Code: 36–04–10–002)

  • Location: Zauma, Bukkuyum, Zamfara
  • Votes: PDP received 36 votes.
  • PDP Outlier Score: 89.5
  • Details: This polling unit had 242 accredited voters out of 745 registered voters. The outlier score for PDP indicates a significant deviation from neighboring units, suggesting possible voting irregularities at this location.

2. GIDAN MASAKA KABA / DAN HILI (PU-Code: 36–05–06–015)

  • Location: Gidan Masaka Kaba, Bungudu, Zamfara
  • Votes: PDP received 215 votes.
  • PDP Outlier Score: 89.5
  • Details: This polling unit had 99 accredited voters out of 698 registered voters. The high number of votes for PDP, coupled with the outlier score, indicates a substantial deviation from the expected voting pattern, which may need further investigation.

3. SAKEEN KADE / KOFAR GIDAN A LABBO (PU-Code: 36–03–05–006)

  • Location: Sakeen Kade, Birnin Magaji, Zamfara
  • Votes: PDP received 40 votes.
  • PDP Outlier Score: 70.5
  • Details: This polling unit had 143 accredited voters out of 649 registered voters. The notable deviation in PDP votes compared to neighboring units resulted in a high outlier score, indicating potential irregularities that require further examination.

NNPP TOP 3 OUTLIERS:

1. SAKEEN KADE / KOFAR GIDAN A LABBO (PU-Code: 36–03–05–006)

  • Location: Sakeen Kade, Birnin Magaji, Zamfara
  • Votes: NNPP received 0 votes.
  • NNPP Outlier Score: 2
  • Details: This polling unit had 143 accredited voters out of 649 registered voters. The absence of votes for NNPP, paired with the outlier score, indicates a slight deviation from neighboring units, suggesting potential irregularities.

2. KATAFANA / DAN FAGE (PU-Code: 36–04–01–013)

  • Location: Katafana, Bukkuyum, Zamfara
  • Votes: NNPP received 4 votes.
  • NNPP Outlier Score: 2
  • Details: This polling unit had 41 accredited voters out of 641 registered voters. The relatively low number of votes for NNPP, combined with the outlier score, suggests a slight deviation from expected voting patterns that may need further investigation.

3. DAMRI I / MODEL PRIMARY SCHOOL (PU-Code: 36–02–03–001)

  • Location: Damri I, Bakura, Zamfara
  • Votes: NNPP received 0 votes.
  • NNPP Outlier Score: 1
  • Details: This polling unit had 551 accredited voters out of 1289 registered voters. The zero votes for NNPP, combined with the outlier score, indicate a slight deviation from neighboring units, warranting further examination to confirm any irregularities.

Conclusion

This analysis used geospatial techniques and outlier detection to uncover irregularities in Zamfara State’s election data. By examining party votes and proximity between polling units, significant outliers were identified. These discrepancies suggest potential irregularities that require further investigation, highlighting areas where voting patterns differed significantly from neighboring units. The thorough methodology included data preprocessing and outlier score calculation, emphasizing the need for thorough verification to uphold electoral integrity. This approach not only offers insights into specific polling units but also underscores the role of data-driven methods in promoting transparency and accountability in elections.

--

--

Chika Chukwu

Skilled Data Analyst proficient in Microsoft Excel, SQL, and Power BI, adept at transforming data into actionable insights for strategic decision making.