Predicting masked ID with check-digit

syIsTyping
don’t code me on that
3 min readApr 30, 2022

Some identification numbers (IDs) come with check-digits, for example national IDs, membership IDs. In some cases, data protection regulation mandates masking parts of the ID during storage or when displayed. In most of these masked cases, the check-digit remains unmasked. Theoretically then, we could use the check-digit to generate the original ID. I was curious as to how effective that is, and just for fun I took a closer look.

Check-Digit

As a simple example, I used the ISBB-10’s check-digit algorithm, which uses modulo-11 with positional weights. In short, given an ISBN 0–306–40615–2 (or 0306406152 without dashes), the last digit 2 is the check-digit. This is calculated by doing a sum-product of the prior digits with the positional weights like so:

[0, 3, 0, 6, 4, 0, 6, 1, 5] . [10, 9, 8, 7, 6, 5, 4, 3, 2] = 130

The modulus 11 is calculated, and subtracted from 11 to obtain the check-digit:

130 % 11 = 9
11 - 9 = 2

The python script to do the above is:

WEIGHTS = [10, 9, 8, 7, 6, 5, 4, 3, 2]
MODULO = 11
def generate_check_digit(digits):
checksum = sum([x * y for x, y in zip(digits, WEIGHTS)])
remainder = checksum % MODULO
return MODULO - remainder
// generate_check_digit([0, 3, 0, 6, 4, 0, 6, 1, 5])
// 2

Predicting original ID using masked ID

Using the ISBN above (0306406152) as an ID, and assuming we need to mask the first 6 digits, we get ******6152. Without knowing anything else about the number, we know that there are 999,999 possible IDs that matches (6 masked digits, each of which could be 0–9).

Since we know that the last digit is a check-digit, and assuming we know the check-digit algorithm, we can calculate the check-digits for each of the 999,999 options and filter for those which produces a check-digit “2”. Ie: 000000615–2 to 999999615–2. When we do that, we end up with only ~91,000 valid options remaining, a 91% reduction.

ISBN is a 10-digit number (9-digit ID + 1 check-digit: ******dddc), and we chose to mask 6 of it. Other combinations behave the same:

  • 8-digit ID, mask 5 (*****dddc): from 99,999 to 9.1k options
  • 7-digit ID, mask 4 (****dddc): from 9,999 to 909 options

Knowing other parts of the ID

Most likely, the ID contains encoded information which can help reduce the options further. For example, most national IDs contain birth year or location code. These encoded information is then appended to a “serial number” part to form the whole ID.

What’s more, in most cases the is the serial part is unmasked, together with the check-digit (so that the masked ID still remains “unique” enough to serve as a distinguishing factor). This means the masked portion usually contains only such encoded information.

Using an original ID of 79416152 (7-digit ID with 4 masked, or ****6152), if the middle 2 digit is the birth year and we know the birth year is 40, we further reduce the valid options to just 9 IDs.

['7908615', '7919615', '7930615', '7941615', '7952615', '7963615', '7974615', '7985615', '7996615']

Notes and Observations

Initially I had assumed only that the check-digit would reduce valid options by a small extent, I did not anticipate it to be 91%. Looking back, this is a d’oh moment, as a mod-11 check-digit means that only 1/11 = 9% of generated numbers are valid. The more granular the check-digit, the more options are eliminated. Knowing more about the original ID (eg: birth year) simply increases the unmasked portion of the ID thereby reducing the options accordingly.

The existence of the unmasked check-digit and a few pieces of OSINT-able information such as birth year makes is much easier to predict the rest of the ID. As shown above, in a 7-digit, 4-masked ID scheme, this means only 9 IDs are valid which is really easy to brute force (for eg, phone verification with customer support).

Perhaps masking the check-digit then is a good idea. Perhaps mask parts of the serial portion as well, since it is not OSINT-able.

--

--

syIsTyping
don’t code me on that

Security engineer and new dad in Japan. I've learnt a lot from the community, so I hope to contribute back. I write technical articles and how-to guides.