Enrolment rejections are accelerating

15 crore enrollments till 1 January 2017 were rejected, in which 6.23 crore were marked as biometric duplicates with no explanation

Anand Venkatanarayanan
Kaarana
Published in
9 min readNov 22, 2017

--

A basic argument that UIDAI has made in various representations to the Supreme Court is outlined in this Indian Express article. Quote:

Aadhaar has emerged as a powerful instrument which enables people to establish their identity, receive their entitlements and exercise their rights without fear of being excluded or having their rights taken away.

This is problematic because even though the Aadhaar Act explicitly says that enrollment is a right (Section 3.1, quoted), enrollment failures are widespread and are increasing.

Every resident shall be entitled to obtain an Aadhaar number by submitting his demographic information and biometric information by undergoing the process of enrollment

This section will focus on the evidence of such failures and the reasons, with links to various RTIs and papers.

How are biometric identifiers stored?

The enrollment process requires at least two documentary evidences for proof of identity and proof of residence. It also requires 10 fingerprints and 2 iris scans. It is a combination of existing documentary proof and biometrics. While the “introducer mode” allows people without documentary evidence to enroll, usage is miniscule enough to be ignored.

Biometrics are collected and stored centrally to aid deduplication as a signature feature of the product. This requires high quality images of both fingerprints and the iris. This is important because of the technology used for deduplication.

There are two ways to do deduplication:

  • Scan the fingerprint or iris into a computer image and do image comparison.
  • Scan the fingerprint or iris and extract out the “template” and use the “template” to do comparison.

The UIDAI (and almost everyone else) only does template extraction and comparison. The process is described below visually for a fingerprint, and is the same for iris also.

The dots on the right hand picture are called minutiae and the entire set of them together are called “template”. What gets stored for comparison is not the image on the left hand picture, but the “template” in the following form. (Just pretend for a moment that these dots are minutiae.)

Since every dot (minutiae) is stored as a point within the box, it can be represented as a point (in geometry) like (0.2, 0.3) and so on. We are now finally ready to understand what a fingerprint template is. It is a list of numbers usually written as shown below:

The term “templatization” refers to converting every fingerprint and iris into such a numerical representation. The algorithm to convert it from a digital image to a template is standardized by UIDAI, and hence every enrollment device that was used for enrollment has to conform to that standard.

(Caveat: This description is a simplification. The common algorithms to convert an image to a template are a lot more involved, but the basic theory is the same.)

How does deduplication work?

The fundamental assumption behind biometric identifiers is that there exists a part of the human body that is unique to an individual, and we can use a digital representation of that part to uniquely identify the individual from across the entire population.

The first problem with this assumption is ageing and time. Every single cell and part of the human body ages and changes, and events in life such as disease or constant manual work can accelerate it. Hence this assumption of uniqueness is true only within a time range. (Mostly stable after age 15, but wildly variable before that. Infant biometrics is still an emerging technology.)

There are known cases of people who were given duplicate IDs over time.

The second problem is that we conflate identifying a person by a human, with a machine or algorithm identifying a person. These two processes are entirely different. We evolved to identify others accurately because of the evolutionary pressure to reproduce. Consider the following questions:

  1. How often have you wrongly identified a stranger as a familiar person?
  2. How often have you wrongly identified a stranger as a familiar person, and also gotten the gender wrong?

The answer to (2) is likely never, irrespective of the answer to (1). However, an algorithm which tries to identify people using a biometric identifier has not evolved under reproductive pressure. Hence, it has to use different means to identify people through computation, and has to use different conceptual constructs. One such concept is that two points are the same if they overlap each other. Why is this important for deduplication?

Recall that every single biometric identifier is converted into a list of points. If we visualize fingerprints from the same person twice, the points will overlap with each other as shown below:

A perfect overlap of points is impossible for any biometric identifier of the same person taken even in successive attempts. Why is that? It is simply because of extraneous factors such as orientation, sweat, tears, physical position and other mechanical factors as illustrated below.

Since perfect matching is impossible, a threshold is specified. For instance, let us say that every biometric identifier is converted into 100 points in the box. We can say that if more than 75 points match, then it is the same person and if not, it is a different person. There is an inherent conflict with the threshold though.

If a very large number is specified (say 99), the biometric identifier of the same person would not be recognized as belonging to the same person. If a small number is specified (say 50), some other person’s biometrics would be matched as yours. Now consider the UIDAI’s problem here very carefully:

  • The signature promise of the program is that the same person cannot be given more than one Aadhaar number because of biometric deduplication.
  • However it cannot keep the threshold very high because ageing changes biometric identifiers, and hence it will enable the same person to enroll twice at different points in his life (say 40 years apart) and get two Aadhaar numbers.
  • But if it keeps the threshold low, then quite a few enrollments will be marked as duplicates of some other previous enrollment, and would be rejected.

Hence, enrollment rejection of a genuine person trying to enroll, and enrollment acceptance of the same person enrolling again are both definite possibilities, and they are always pushing from opposite directions.

This problem is in the very nature of biometrics, and cannot be worked around or fixed.

What are the implications of these problems?

  • There is no 100% guarantee that a genuine person who had enrolled for Aadhaar will be allotted an Aadhaar number, because their biometrics could be wrongly matched as belonging to someone else.
  • There is always a possibility that the same person who enrolls at different points of time in their life will get two different Aadhaar numbers, particularly if the time difference between the enrolments is high.

The UIDAI itself (page 4) has estimated this to be:

  • Zero failures to enroll. (All genuine persons will get an Aadhaar number as a matter of policy)
  • Biometric exceptions will constitute 0.14% of the population. (This means as per their own study, 0.14% of the population do not have good quality biometrics to enroll, and hence need special handling)
  • Genuine persons trying to enroll who will be incorrectly identified as someone else by the deduplication process will be 0.057%
  • Only 0.035% of people who try to enroll twice will be given two different Aadhaar numbers
  • 0.5% of the population attempted duplicate enrollment (at the time of the study in 2011, which UIDAI projects to hold true for 120 crore enrollments)
  • The above rates are applicable even for multi-modal comparison, meaning the use of both fingerprints and iris for deduplication. (Source)

What is the proof that these assumptions are wrong?

The Centre for Internet and Society (CIS) published a paper in the Economic and Political Weekly which pointed out that enrollment rejection rates will accelerate as the size of the database increases. The paper is highly mathematical and makes a few interesting predictions, as shown in the table below:

The last column is important. It predicts that at the current enrollment level (120 crore), one in every 121 genuine enrollments will be marked as a duplicate. We have three different data points obtained through RTI requests and parliamentary questions to examine this prediction against.

From the Planning Commission report, we know the following:

From LS UQ 7073 on 08.05.2015, we know the following:

And from the RTI request put up by databaazi, we know the following:

Out of the 15.08 crore rejected enrollments, 6.23 crore were rejected because they were flagged as biometric duplicates. We can now produce another table to compare the prediction by CIS to that revealed by the RTI.

A few things jump out when compared to what UIDAI had said in their 2011 proof of concept study:

  • Overall enrollment rejections are growing as more and more people enroll into the program (8.54%, 10.48%, 12.49% on an overall basis).
  • Enrollment rejections within sub-periods are accelerating at a phenomenal speed (2010–13: 8.54%, 2013–15: 11.50% and finally 2015–17: 19.07%). The last figure means that nearly one in five enrollments were rejected between 2015 to 2017.
  • Biometric duplicates alone are 900% higher than predicted by the CIS paper.

Conclusion

Either close to 6% of the population tried to enroll more than once, which is 12× more than the UIDAI’s original estimates, or biometric duplicates are 9× more than what the CIS paper predicts.

Either of the above could be true to some extent, but one thing we know for sure is that there is overwhelming statistical evidence against both 0.5% re-enrollment of the population, and only 0.057% chance of false biometric duplicates being true (as estimated by the UIDAI in 2011).

To understand this issue better, we need a breakup of the enrollment rejections such as demographic duplicates, process errors, manual adjudication or semi-automated adjudication, but of late UIDAI has made a habit of replying to all RTI queries on enrollment rejections with “Not Compiled” or “Not Available”, or by citing the national security clause.

If—as the Aadhaar Act says—no eligible person will be denied an Aadhaar, UIDAI has to publish a detailed data breakup to help the public understand the accelerating enrollment failures and the high number of biometric duplicates. Otherwise, the Act only offers an empty assurance with no relief for the excluded.

--

--