(1/3) Deep dive into beneficiary de-duplication in the Nigerian context: Data management workflows

Published in

Frontier Tech Hub

8 min readNov 17, 2022

This three-part article is the second I’m writing about our findings and learnings in the context of the piloting of a blockchain-based technology to address the beneficiary deduplication problem in Nigeria.

The purpose of the pilot is to test whether a blockchain-based technology can be used to detect duplicate beneficiaries in Nigeria. The technology was previously piloted and proven effective and efficient in detecting duplicate beneficiaries in Syria; so, the aim is to build on the success of Syrian pilot and deploy the same technology in Nigeria, while taking into consideration the differences between the Syrian and Nigerian contexts. The first sprint of the pilot was all about raising awareness of the problem, announcing the pilot, gathering momentum, and encouraging humanitarian actors to participate in the pilot.

In this sprint, we wanted to engage with humanitarian agencies and experiment with some of the most important assumptions that would encourage and motivate them to participate in the pilot, namely: their data collection and management workflows, the use of biometrics, as well as compliance with data protection laws and regulations. In this article, we’ll cover the results of the data workflows experiment.

One of the main added values of the proposed technology is that it does not require agencies to make radical changes to their existing data collection and management workflows, so, in the data workflows experiment, we wanted to examine these workflows and test whether this still holds for the Nigerian context.

Key takeaways

Data is collected, cleaned, stored, and managed over two stages, the assessment stage, and the collection stage.
Various teams in an agency collaborate on the collection and management of data; local community committees contribute to ensuring data accuracy and beneficiary validation.
Beneficiary data that is collected include personal information as well as sector-specific information.
Among techniques used in ensuring data accuracy are data cross-checks, and community committee reviews.
Tools used for data collection and processing include ODK-based Kobo Toolbox and the Microsoft Office and Power BI applications; cloud storage used for backup and storage purposes.
Extra precautions and taken to ensure data I stored securely and safely with restricted access for compliance with data protection laws.

Data collection and management workflows experiment

The proposed technology is designed to work with existing data collection and management workflows already in place. The reason for that is to make the deployment of the technology as least disruptive of existing practices as possible so that it is easy for agencies to install and start using the GeniusChain Platform for detecting duplicates. In this experiment, we wanted to talk to agencies and learn more about their data collection and management workflows to examine what are the requirements for organizations to install and start using the system. We held several meetings with the agencies participating in the pilot to understand and map out their data workflows; below are our findings.

Overall workflow

There are primarily two stages where data collection, handling and management is performed, namely: assessment and distribution. During the assessment stage, the teams gather beneficiary information for the purpose of assessing their humanitarian aid needs. The distribution stage is when the organization distributes the humanitarian aid to beneficiaries.

In the assessment stage, the organization enters the community, and appoints several community committees. The committees serve as the interface between the organization and the beneficiaries and perform various tasks such as identifying community members, beneficiaries and their family members, as well as validating communities during distribution. Also, during this stage, the organization collects beneficiary information according to a predefined data collection survey with questions about the beneficiaries’ personal information, as well as information specific to their needs (such as their food, or healthcare needs). The collected data is cleaned and validated for accuracy and aggregate findings are shared with regional sector hubs for ensuring no overlaps with existing aid efforts and the approval of the needed aid.

In the distribution stage, the organization delivers humanitarian aid to beneficiaries with the help of the Programs team, the M&E team, and the community committees. The M&E team ensures that the beneficiary data is accurate and verified and then shares the data with the Programs team, which in turn handles the distribution of aid to the beneficiaries. The community committees primarily assist the Programs team in identifying the beneficiaries and their family members. A mobile application is used for verifying beneficiary identities through their information and pictures. The M&E team regularly takes snapshots of data to report on progress against the baseline over time.

Actors and roles

As mentioned above, various teams and staff members are involved in collecting the data and ensuring its accuracy and reliability; including: the field team, who’s primarily responsible for collecting beneficiary data in the field during the assessment stage, the communities and community leads, whom organizations rely on to identify community members and ensure the data collected about them is accurate and reliable, the M&E information management team, who are responsible for validating and cleaning the data to ensure that it’s usable by the Programs team, who also double check the data for accuracy and usability. Lastly, for increased precision and for ensuring no overlaps between projects, the data is shared with the regional sector hubs for review and coordination of aid.

Data end points

The kinds of data collected about beneficiaries differ, depending on the project and its needs; overall, two types of data are usually collected about beneficiaries, personal/demographic data, and sector-specific data. The sector-specific data that is collected is different from one project to another. The personal data includes such information as:

Name, Surname, Nickname, Community name, Date of Birth, household size (i.e., number of family members), and a picture of the entire family together for verification purposes.
House GPS coordinates.
Age/Sex disaggregation of each household according to predefined categories.

Data accuracy and reliability checks (and duplication checks)

When collecting beneficiary data, the organizations perform various checks in order to “clean” the data and make sure it is accurate, reliable, and usable by all the other teams involved in the assessment and distribution efforts. The following are some of the checks and techniques the organization uses to ensure data accuracy and reliability:

To avoid the risk of some beneficiaries inflating their family sizes, the team usually registers only the family members that they see during the field visit; also, the community committees double check the numbers to ensure that the registered families belong to the community or tribe that they claim they are a part of, as well as confirming the family size.
Data is reviewed and double checked for typos and erroneous data entries; when reviewed, all data needs to make sense (such as the spelling of the names, the reported ages, and family sizes). The teams usually look for outliers and data anomalies, and validate those with the field teams, and the community committees.
Data is cross-checked against itself, I.e., information is verified against other information to ensure that the data is consistent and makes sense. For instance, the numbers of males and females must match the disaggregation/categorization by sex that was reported; also, the ages and dates of birth that were collected must match, and be consistent with, the reported disaggregation/categorization by age. If anomalies and consistencies were detected in those cross-checks, data is checked again by all relevant teams and corrections are made. Data can also be cross-checked against sector-specific information for accuracy and consistency.

One of the risks of data inaccuracies that were reported by the organization was that some beneficiaries may change their names in hopes they’d get more aid posing as different people. This is something that is important to consider, for the purposes of this pilot, and is also one of the reasons why we explored the use of biometrics with the system.

Data collection and management tools

There are several technology tools used to efficiently collect and manage data, these tools are: Kobo Collect, Microsoft Excel, Power BI, and a local mobile application. Kobo Collect is used to design data collection surveys and collect beneficiary data, and store that data in the Kobo Collect account. Microsoft Excel is used for data cleaning, aggregation as well as reporting (using Pivot Tables and similar Excel features). Power BI is used for data and trends monitoring purposes over time, such as the monitoring of assessments, baselines, and changes on indicators. The Mobile App is an offline tool used to verify the identity of the beneficiaries during distribution. The team downloads the beneficiary information and pictures to the application and then use it to verify the identity of each beneficiary household before giving them their allocated aid. Local Hard Drives and Google Drive are used for securely and safely storing and backing up the beneficiary information.

Data protection law compliance requirements

The organizations are required to comply with the Nigerian Law on Data Protection (more on that later) as well as the GDPR. The organizations are not only committed to compliance with these regulations, but they also take extra measures to ensure the safety and security of their beneficiaries; among (but not limited to) the measures adopted for compliance are the following:

The beneficiaries’ consent is obtained by the field teams, prior to collecting their information. Also, consent is obtained specifically for taking family pictures. Beneficiaries that aren’t comfortable with their pictures being taken can choose not to have it captured.
No personal data is shared publicly or with external third parties; only aggregates are shared on a need-to-know basis, and these aggregates do not contain any personally identifiable information. The teams double check aggregates before sharing with external institutions.
All teams undergo training on how to handle and process beneficiary data, ensuring that this data is always stored securely and safely and that breaches are avoided.
Access to beneficiary information is restricted, and only certain individuals have access to it and for specific purposes and uses.
Data collection survey questions are reviewed and double checked before use to ensure that there are no unnecessary questions (for information that isn’t needed) being asked, specifically about sensitive information that may put beneficiaries in danger. Further, beneficiaries always have the right not to provide answers to questions.
Beneficiaries have access to the data that is collected about them, and they can, at any time, request access to that data, or request that this data is deleted.

What this means for the use of the system

These findings will be considered when designing and configuring the GeniusChain UID Platform for duplication detection. The findings demonstrate that the UID platform already accommodates the needs of the Nigerian context, from a technical perspective; which means that no technical changes are required at the moment.