Implementing the UPC/EAN Check Digit Algorithm in C#

Michael Harges
15 min readSep 27, 2023

--

Photo by Antoine Dautry on Unsplash

Implementing and optimizing a widely used check digit algorithm

In my previous article, I took a deep dive into implementing the Luhn algorithm which is used in credit card numbers and a variety of other identifiers. In this article I’ll cover another commonly used check digit scheme, the algorithm originally employed by Universal Product Code (UPC) and from there, a wide range of international standards for product identifiers. This algorithm is applicable to values such as International Article Number/European Article Number (IAN/EAN), Global Trade Item Number (GTIN), International Standard Book Number (ISBN-13), International Standard Serial Number (ISSN), International Standard Music Number (ISMN), Serial Shipping Container Code (SSCC) and more. For brevity, I’ll simply refer to this algorithm as the EAN algorithm throughout this article. More details about this algorithm can be found in in Wikipedia.

Before going further, I want to announce that I’ve recently released CheckDigits.Net, a .Net library of optimized implementations of 20 different check digit algorithms. Benchmarks comparing CheckDigits.Net to popular NuGet packages have shown performance increases of 3X, 10X and in some cases up to 50X, depending on the algorithm. In addition, CheckDigits.Net completely eliminates the memory allocations that are common in other packages. You can find CheckDigits.Net on NuGet or by searching for CheckDigits.Net in your IDE’s NuGet package manager.

If you’re familiar with the Luhn algorithm, the EAN algorithm will be quite easy to follow. Ignoring the position occupied by the check digit (if present), separate the value into individual digits. Starting at the rightmost position and moving left, multiply odd position digits by 3 and even position digits by 1. Sum all the products and calculate sum mod 10. If sum mod 10 is zero, then the check digit is zero. Otherwise, the check digit is 10 — (sum mod 10). Here is a worked example for a 12 digit UPC-A barcode from Wikipedia:

calculate check digit 03600029145?

| Digit | Weight | Product | Sum |
| ----- | ------ | --------| --- |
| 5 | 3 | 15 | 15 |
| 4 | 1 | 4 | 19 |
| 1 | 3 | 3 | 22 |
| 9 | 1 | 9 | 31 |
| 2 | 3 | 6 | 37 |
| 0 | 1 | 0 | 37 |
| 0 | 3 | 0 | 37 |
| 0 | 1 | 0 | 37 |
| 6 | 3 | 18 | 55 |
| 3 | 1 | 3 | 58 |
| 1 | 3 | 0 | 58 |

58 mod 10 = 8
check digit = 10 - 8 = 2

final value = 036000291452

Like the Luhn algorithm, the EAN algorithm can detect all single digit transcription errors (mistakenly typing a different digit for any digit in the value). According to Wikipedia it can also detect 89% of all two digit transposition errors (90 -> 09 or vice versa). However, it cannot detect two digit transposition errors where the difference between the two digits is 5 (49 -> 94). Also, the algorithm cannot detect jump transpositions (138 -> 831). (Kamaku, 2017) This is because the alternating weights that are applied mean that the sum of products calculated by the algorithm will be the same if the transposed digits are one (or three or five, etc.) place(s) apart from each other.

Implementing the Algorithm

As with the Luhn algorithm, we’ll focus on validating an existing check digit. We will implement a method for the following requirements:

  • Input will be a string.
  • Will be able to validate the check digit for a range of different applications (UPC-A, UPC-E, EAN-13, EAN-8, ISBN-13, SSCC…).
  • Will assume that the check digit is in the last position.
  • Output will be a boolean where true = the input string contains a valid check digit and false = the input string does not contain a valid check digit.
  • The code should be resilient, meaning that invalid input should not throw an exception and instead should return false to indicate that there is not a valid check digit.

As before, we’ll take the approach of creating a basic implementation that successfully applies the algorithm, then making that implementation resilient and then optimizing it. Throughout the process we’ll use red/green development. So we start by creating an empty method.

public static class BarcodeAlgorithm
{
public static Boolean ValidateCheckDigit(String str)
{
throw new NotImplementedException();
}
}

// Example usage:
var str = "036000291452";

var isValid = BarcodeAlgorithm.ValidateCheckDigit(str);

Then, we need to create some tests. Let’s begin by focusing on some algorithm details. We know that the weight for odd position digits should be 3, so let’s create a test where only a single digit is non-zero. If we use 1 for the non-zero value, we can quickly calculate that the check digit is 7 [10 — ((1 * 3) mod 10)]. Then we can create several test cases where the single non-zero digit is in different odd position characters. Using xUnit and FluentAssertions, our first test looks like this:

public class BarcodeAlgorithmTests
{
[Theory]
[InlineData("000000000017")]
[InlineData("000000001007")]
[InlineData("000000100007")]
public void ValidateCheckDigit_ShouldCorrectlyApplyOddPositionWeight(String value)
=> BarcodeAlgorithm.ValidateCheckDigit(value).Should().BeTrue();
}

Note that it’s not necessary to test every odd position, we only need enough cases to establish that odd position values beyond the first one are being handled correctly. Then we create a similar test for even position weights. Using a single digit 1 in an even position would result in a check digit of 9 [10 — ((1 * 1) mod 10)].

   [Theory]
[InlineData("000000000109")]
[InlineData("000000010009")]
[InlineData("000001000009")]

public void ValidateCheckDigit_ShouldCorrectlyApplyEvenPositionWeight(String value)
=> EanAlgorithm.ValidateCheckDigit(value).Should().BeTrue();

Another algorithm detail is the use of the modulus 10 calculation. Per the algorithm, if the sum mod 10 is zero then the check digit is zero. Otherwise, the check digit is 10 — (sum mod 10). A lot of example Luhn algorithm implementations on the web miss that case so we’ll create explicit tests for those that too.

   [Theory]
[InlineData("000000000000")] // sum mod 10 = 0
[InlineData("000000090100")] // "
[InlineData("000000000901")] // sum mod 10 != 0
[InlineData("000000000802")] // "
[InlineData("000000000703")] // "
[InlineData("000000000604")] // "
[InlineData("000000000505")] // "
[InlineData("000000000406")] // "
[InlineData("000000000307")] // "
[InlineData("000000000208")] // "
[InlineData("000000000109")] // "
public void ValidateCheckDigit_ShouldCorrectlyHandleMod10Results(String value)
=> EanAlgorithm.ValidateCheckDigit(value).Should().BeTrue();

Next, let’s create a general success test using a variety of identifier types.

   [Theory]
[InlineData("036000291452")] // Worked UPC-A example from Wikipedia (https://en.wikipedia.org/wiki/Universal_Product_Code#Check_digit_calculation)
[InlineData("425261")] // UPC-E example
[InlineData("4006381333931")] // Worked EAN-13 example from Wikipedia (https://en.wikipedia.org/wiki/International_Article_Number)
[InlineData("73513537")] // Worked EAN-8 example from Wikipedia
[InlineData("9780500516959")] // ISBN-13, Islamic Geometric Design, Eric Broug
[InlineData("012345678000045678")] // Example SSCC number
public void ValidateCheckDigit_ShouldReturnTrue_WhenInputContainsValidCheckDigit(String value)
=> EanAlgorithm.ValidateCheckDigit(value).Should().BeTrue();

Then we’ll create a test for cases where we expect the validation to succeed because the input contains an error that the algorithm cannot detect.

   [Theory]
[InlineData("4006831333931")] // EAN-13 with two digit transposition error (38 -> 83) where difference between digits is 5
[InlineData("9785000516959")] // ISBN-13 with two digit transposition error (05 -> 50) where difference between digits is 5
[InlineData("73315537")] // EAN-8 with jump transposition error (515 -> 315)
[InlineData("012345876000045678")] // SSCC number with jump transposition error (678 -> 876)
public void ValidateCheckDigit_ShouldReturnTrue_WhenInputContainsUndetectableError(String value)
=> EanAlgorithm.ValidateCheckDigit(value).Should().BeTrue();

Finally, we should have tests where the validation is expected to fail because of a detectable error.

   [Theory]
[InlineData("036000391452")] // UPC-A with single digit transcription error (2 -> 3)
[InlineData("427261")] // UPC-E with single digit transcription error (5 -> 7)
[InlineData("4006383133931")] // EAN-13 with two digit transposition error (13 -> 31)
[InlineData("9870500516959")] // ISBN-13 with two digit transposition error (78 -> 87)
public void ValidateCheckDigit_ShouldReturnFalse_WhenInputContainsDetectableError(String value)
=> EanAlgorithm.ValidateCheckDigit(value).Should().BeFalse();

With that suite of tests in place, we can be reasonably sure that any implementation that passes those tests correctly applies the algorithm. So now we move on to turning all our failing tests (red) into passing tests (green).

Below is a strictly literal implementation of the barcode mod 10 algorithm. We’ll have several opportunities to improve on this implementation as we go forward.

   public static Boolean ValidateCheckDigit(String str)
{
var sum = 0;
var oddPosition = true;
for(var index = str.Length - 2; index >= 0; index--)
{
var currentDigit = str[index] - '0';
sum += oddPosition ? currentDigit * 3 : currentDigit;
oddPosition = !oddPosition;
}
var mod = sum % 10;
var checkDigit = mod == 0 ? 0 : 10 - mod;

return str[^1] - '0' == checkDigit;
}

(You can refer to my Luhn algorithm article for a discussion of why the expression str[index] — ‘0’ is a highly efficient way to convert a digit character to an integer.)

With that implementation in place, we can run our tests and see them all turn green. So now we can turn our focus on the resiliency of our method.

Improving Resiliency

We’ve proven that our implementation can handle well formed input. But we also want it to be able to handle as many of the variations of malformed input as possible in a graceful way and not resort to throwing exceptions. Per our requirements, invalid requirements should simply return false to indicate that the input does not contain a valid check digit.

The most likely version of malformed input would be a null string. Then followed by an empty string. So we add tests for those cases and modify our method so that they pass.

   [Fact]
public void ValidateCheckDigit_ShouldReturnFalse_WhenInputIsNull()
=> EanAlgorithm.ValidateCheckDigit(null!).Should().BeFalse();

[Fact]
public void ValidateCheckDigit_ShouldReturnFalse_WhenInputIsEmpty()
=> EanAlgorithm.ValidateCheckDigit(String.Empty).Should().BeFalse();

public static Boolean ValidateCheckDigit(String str)
{
if (String.IsNullOrEmpty(str))
{
return false;
}

.
.
.
}

Another form of malformed input is a string consisting of a single digit. Given that the check digit is in the rightmost position, that single digit would be the check digit and there would be no other data from which to calculate the check digit. As the code exists now, strings containing the digits 1–9 would fail the validation, but a string consisting of a single zero would pass validation. So, we’ll create a test that covers a single digit and modify our method to pass that test.

   [Theory]
[InlineData("0")]
[InlineData("1")]
public void ValidateCheckDigit_ShouldReturnFalse_WhenInputIsOneCharacterInLength(String value)
=> EanAlgorithm.ValidateCheckDigit(value).Should().BeFalse();

public static Boolean ValidateCheckDigit(String str)
{
if (String.IsNullOrEmpty(str) || str.Length < 2)
{
return false;
}

.
.
.
}

The last case of malformed input that we’ll consider is a string that contains a non-digit character. The expression str[index] — ‘0’ will evaluate to a negative number if the character is lower on the ASCII table than ‘0’ and a positive number > 9 if the character is higher on the ASCII table than ‘9’. It’s quite likely that that would be sufficient for the check digit calculation to not match the check digit in the input string, but the modulus 10 operation means that there’s a one in 10 chance that a random character just happens to calculate as a match to the check digit. For example, consider the UPC-E string “425261”. The sum of the products of the individual digits is 49 and 49 mod 10 = 9, resulting in a check digit of 1. If we replace the ‘5’ with an uppercase ‘I’ then the calculated check digit will still be 1. This is because ‘I’ is exactly 20 places from ‘5’ in the ASCII table and the sum of products of the individual digits would be 99 and 99 mod 10 would still result in 9 and a check digit of 1. Replacing the ‘5’ with a ‘+’ character would yield the same result because ‘+’ is exactly 10 places before ‘5’ in the ASCII table. With that long winded explanation in mind, we can create a test to guard against that edge case and modify our method to handle that case.

   [Theory]
[InlineData("42+261")] // UPC-E example with 5 replaced with character 10 positions before in ASCII table
[InlineData("42I261")] // UPC-E example with 5 replaced with character 20 positions later in ASCII table
[InlineData("0 36000 29145 2")]
public void ValidateCheckDigit_ShouldReturnFalse_WhenInputContainsNonDigitCharacter(String value)
=> EanAlgorithm.ValidateCheckDigit(value).Should().BeFalse();

public static Boolean ValidateCheckDigit(String str)
{
.
.
.
var currentDigit = str[index] - '0';
if (currentDigit < 0 || currentDigit > 9)
{
return false;
}
.
.
.
}

Here’s the full implementation of the method that passes the complete test suite.

   public static Boolean ValidateCheckDigit(String str)
{
if (String.IsNullOrEmpty(str) || str.Length < 2)
{
return false;
}

var sum = 0;
var oddPosition = true;
for (var index = str.Length - 2; index >= 0; index--)
{
var currentDigit = str[index] - '0';
if (currentDigit < 0 || currentDigit > 9)
{
return false;
}
sum += oddPosition ? currentDigit * 3 : currentDigit;
oddPosition = !oddPosition;
}
var mod = sum % 10;
var checkDigit = mod == 0 ? 0 : 10 - mod;

return str[^1] - '0' == checkDigit;
}

Optimization

The final step in our implement/add resiliency/optimize approach is to see if we can improve the performance of the method. We already have a full suite of tests, so we are able to quickly verify that any changes we make for performance don’t accidentally introduce algorithmic errors.

To effectively optimize a piece of code, you need to be able to quantify the impact of proposed changes and for that you need to use benchmarks.

To establish a baseline to measure proposed changes against, we add a new command line project to the solution and add dependency for Benchmark.Net. Here is the baseline benchmark. Note the Params attribute on the Value property. This lets us run the benchmark repeatedly for values of different lengths and see if there are performance improvements that are only realized for longer values.

[MemoryDiagnoser]
public class EanAlgorithmBenchmarks
{
[Params("425261", "73513537", "036000291452", "4006381333931", "012345678000045678")]
public String Value { get; set; } = String.Empty;

[Benchmark(Baseline = true)]
public void BaseLine()
{
_ = EanAlgorithm.ValidateCheckDigit(Value);
}
}

At first glance, the baseline implantation seems solid and doesn’t offer many opportunities for optimization. The one area that I would say is a possible target is the summing of odd position characters. The EAN algorithm multiplies every odd position digit by 3 and before adding the product to the sum. If we sum the odd position digits separately from the even position digits we can move the multiplication by 3 outside of the loop and perform a single multiplication on the sum of the odd position digits. On earlier processors, where multiplication was more costly than addition, that would be a worthwhile optimization. So, let’s make that change and see what the results are.

   public static Boolean ValidateCheckDigit_OddEven(String str)
{
if (String.IsNullOrEmpty(str) || str.Length < 2)
{
return false;
}

var sumOdd = 0;
var sumEven = 0;
var oddPosition = true;
for (var index = str.Length - 2; index >= 0; index--)
{
var currentDigit = str[index] - '0';
if (currentDigit < 0 || currentDigit > 9)
{
return false;
}
if (oddPosition)
{
sumOdd += currentDigit;
}
else
{
sumEven += currentDigit;
}
oddPosition = !oddPosition;
}
var mod = ((sumOdd * 3) + sumEven) % 10;
var checkDigit = mod == 0 ? 0 : 10 - mod;

return str[^1] - '0' == checkDigit;
}


[Benchmark]
public void OddEven()
{
_ = EanAlgorithm.ValidateCheckDigit_OddEven(Value);
}


| Method | Value | Mean | Error | StdDev | Median | Ratio | RatioSD | Allocated | Alloc Ratio |
|--------- |------------------- |----------:|----------:|----------:|----------:|------:|--------:|----------:|------------:|
| BaseLine | 012345678000045678 | 27.699 ns | 0.8237 ns | 2.4286 ns | 28.985 ns | 1.00 | 0.00 | - | NA |
| OddEven | 012345678000045678 | 21.900 ns | 0.1220 ns | 0.1141 ns | 21.894 ns | 0.79 | 0.06 | - | NA |
| | | | | | | | | | |
| BaseLine | 036000291452 | 15.414 ns | 0.0850 ns | 0.0754 ns | 15.396 ns | 1.00 | 0.00 | - | NA |
| OddEven | 036000291452 | 14.920 ns | 0.1165 ns | 0.0909 ns | 14.916 ns | 0.97 | 0.01 | - | NA |
| | | | | | | | | | |
| BaseLine | 4006381333931 | 16.261 ns | 0.0870 ns | 0.0814 ns | 16.251 ns | 1.00 | 0.00 | - | NA |
| OddEven | 4006381333931 | 16.420 ns | 0.0904 ns | 0.0802 ns | 16.394 ns | 1.01 | 0.01 | - | NA |
| | | | | | | | | | |
| BaseLine | 425261 | 8.325 ns | 0.0766 ns | 0.0679 ns | 8.329 ns | 1.00 | 0.00 | - | NA |
| OddEven | 425261 | 7.938 ns | 0.0612 ns | 0.0543 ns | 7.920 ns | 0.95 | 0.01 | - | NA |
| | | | | | | | | | |
| BaseLine | 73513537 | 10.658 ns | 0.0885 ns | 0.0785 ns | 10.649 ns | 1.00 | 0.00 | - | NA |
| OddEven | 73513537 | 10.260 ns | 0.0423 ns | 0.0354 ns | 10.262 ns | 0.96 | 0.01 | - | NA |

In all but the longest input (the 16 digit SSCC number), the performance improvement is between 3 and 5 percent. It’s only on the longest value that we begin to see an appreciable performance improvement of 21%. I’d be on the fence about making that change, especially if most of values checked are UPC (12 digits) or EAN (13 digits) values.

There is one other change that we could consider. In the optimized version of the Luhn algorithm we used a lookup table for the odd position values. Here is that change applied to the EAN algorithm and the results of the benchmark.

   private static readonly Int32[] _triples = new Int32[] { 0, 3, 6, 9, 12, 15, 18, 21, 24, 27 };

public static Boolean ValidateCheckDigit_Lookup(String str)
{
if (String.IsNullOrEmpty(str) || str.Length < 2)
{
return false;
}

var sum = 0;
var oddPosition = true;
for (var index = str.Length - 2; index >= 0; index--)
{
var currentDigit = str[index] - '0';
if (currentDigit < 0 || currentDigit > 9)
{
return false;
}
sum += oddPosition ? _triples[currentDigit] : currentDigit;
oddPosition = !oddPosition;
}
var mod = sum % 10;
var checkDigit = mod == 0 ? 0 : 10 - mod;

return str[^1] - '0' == checkDigit;
}


[Benchmark]
public void Lookup()
{
_ = EanAlgorithm.ValidateCheckDigit_Lookup(Value);
}

| Method | Value | Mean | Error | StdDev | Median | Ratio | RatioSD | Allocated | Alloc Ratio |
|--------- |------------------- |----------:|----------:|----------:|----------:|------:|--------:|----------:|------------:|
| BaseLine | 012345678000045678 | 26.686 ns | 0.9160 ns | 2.6865 ns | 25.843 ns | 1.00 | 0.00 | - | NA |
| OddEven | 012345678000045678 | 21.694 ns | 0.0811 ns | 0.0677 ns | 21.706 ns | 0.81 | 0.06 | - | NA |
| Lookup | 012345678000045678 | 20.457 ns | 0.1425 ns | 0.1263 ns | 20.411 ns | 0.77 | 0.06 | - | NA |
| | | | | | | | | | |
| BaseLine | 036000291452 | 15.327 ns | 0.0731 ns | 0.0648 ns | 15.315 ns | 1.00 | 0.00 | - | NA |
| OddEven | 036000291452 | 14.944 ns | 0.1247 ns | 0.1166 ns | 14.929 ns | 0.98 | 0.01 | - | NA |
| Lookup | 036000291452 | 14.253 ns | 0.0614 ns | 0.0544 ns | 14.255 ns | 0.93 | 0.01 | - | NA |
| | | | | | | | | | |
| BaseLine | 4006381333931 | 16.385 ns | 0.1457 ns | 0.1363 ns | 16.352 ns | 1.00 | 0.00 | - | NA |
| OddEven | 4006381333931 | 16.349 ns | 0.0501 ns | 0.0418 ns | 16.358 ns | 1.00 | 0.01 | - | NA |
| Lookup | 4006381333931 | 16.151 ns | 0.0963 ns | 0.0900 ns | 16.151 ns | 0.99 | 0.01 | - | NA |
| | | | | | | | | | |
| BaseLine | 425261 | 8.382 ns | 0.1192 ns | 0.0995 ns | 8.377 ns | 1.00 | 0.00 | - | NA |
| OddEven | 425261 | 7.964 ns | 0.0445 ns | 0.0371 ns | 7.977 ns | 0.95 | 0.01 | - | NA |
| Lookup | 425261 | 8.235 ns | 0.0773 ns | 0.0646 ns | 8.233 ns | 0.98 | 0.02 | - | NA |
| | | | | | | | | | |
| BaseLine | 73513537 | 10.701 ns | 0.1558 ns | 0.1217 ns | 10.663 ns | 1.00 | 0.00 | - | NA |
| OddEven | 73513537 | 10.344 ns | 0.1130 ns | 0.1002 ns | 10.311 ns | 0.96 | 0.01 | - | NA |
| Lookup | 73513537 | 10.073 ns | 0.0738 ns | 0.0654 ns | 10.060 ns | 0.94 | 0.01 | - | NA |

Again, the performance improvement is slight. And for the shortest value, the lookup approach performs slightly worse than the separate summation of odd and even values.

I’ll leave it up to you to decide if either of the optimizations are worth choosing over the baseline implementation. However, if you were to select the lookup approach I would suggest adding additional tests to confirm that the entries in the lookup table are correct. The argument for adding a test like this is that we’re moving part of the algorithm (the calculation of the weighted odd position values) outside of the method being tested and we need a way to confirm that the values in the lookup table are transcribed correctly in the code.

   [Theory]
[InlineData("00")]
[InlineData("17")]
[InlineData("24")]
[InlineData("31")]
[InlineData("48")]
[InlineData("55")]
[InlineData("62")]
[InlineData("79")]
[InlineData("86")]
[InlineData("93")]
public void ValidateCheckDigit_ShouldUseCorrectValuesInOddPositionLookupTable(String value)
=> EanAlgorithm.ValidateCheckDigit(value).Should().BeTrue();

Thanks for reading!

Both my previous Luhn algorithm article and this article are the result of work to add support for several check digit algorithms to a library of guard clauses. I will be continuing this series to cover other algorithms in the future.

About Me

I’m just Some Random Programmer Guy, at least according to the nameplate that was once waiting for me on my first day at a startup. I’ve been around a while and my career has spanned FORTRAN and PL/1 on IBM mainframes to .Net Core microservices with some interesting forays with C/C++, OS/2, VB, C#, SQL, NoSQL and more along the way.

Code for this article is available my public github repository.

References:

Kamaku, Waweru, 2017, On the Weaknesses in Error Detection and Correction in the ISBN-13, International Journal of Science and Research, Volume 6, Issue 6, June 2017, https://www.ijsr.net/archive/v6i6/ART20174839.pdf

--

--