Two-dimensional barcodes robustness analysis through the example of QR code and Aztec code

Published in

Watermarked

13 min readAug 28, 2020

Introduction

The problem of compact presentation of information came about in the 1950s and remains relevant nowadays. The first barcodes were used to label products. Today, various types of barcodes are used in a plethora of spheres.

One-dimensional barcodes

UPC (Universal Product Code) was the first barcode standard. It is an American standard, which was designed to track goods in stores. The first barcode was scanned on June 26, 1974.

The UPC barcode consists of 12 decimal digits. It later became the basis for EAN-13 code consisting of 13 digits with an additional digit identifying the country of origin. These types of barcodes are referred to as linear barcodes.

The information is encoded as follows: white bars mean 1, black bars — 0. Based on the information about the width of a single bar, the scanner deduces exactly how many identical bits are encoded in a bar which is wider than a single one.

The linear barcodes have a significant shortcoming: they allow you to encode only a small amount of information. To solve this problem, two-dimensional barcodes were created, which have their data stored both horizontally and vertically. There are two types of two-dimensional barcodes: multilevel and matrix. The multilevel codes came about first and consisted of a few ordinary linear barcodes set on top of each other. Matrix barcodes are more advanced — the information is packed more densely vertically.

Two-dimensional barcodes

There is a myriad of two-dimensional barcodes:

- PDF417;

- Aztec Code;

- MaxiCode;

- Data Matrix;

- Microsoft Tag;

- QR Code.

PDF417

PDF417 is a two-dimensional barcode developed by Symbol Technologies in 1991, and made publicly available. The format supports the following data types: text, numbers, bytes. In addition to the payload, it contains service information and uses error correction codes.

Aztec Code

Aztec Code is a two-dimensional matrix barcode developed in 1995 by AIM International. It was later published and patented in 1997. The code is currently in the public domain.

Features of the barcode:

The barcode has a specific target pattern called ‘Bullseye’. It ensures that information can be read even from a distorted image (stretched or rotated);
The Reed-Solomon error-correction encoding provides a 5% to 95% level of error correction;
An increased amount of stored information is achieved due to the radial arrangement of the layers. To increase the amount of stored information, you just need to expand the coding area without changing the size of the service information.

Figure 3 — Aztec Code with a 30% redundancy and an “Adoriasoft” line

DataMatrix

DataMatrix is a two-dimensional matrix barcode that comes in rectangular or square pattern. It supports encoding of text and other types of data. The distinguishing feature of the Data Matrix is the ability to select the pattern (square or rectangular).

MaxiCode

MaxiCode is an inch per inch two-dimensional barcode. Any information of 100 characters may be included in the barcode. It uses Reed-Solomon codes for error correction. The barcode is intended for product labeling.

QR Code

QR Code is a matrix barcode that supports various types of data, and has 4 encoding modes for efficient storage thereof (numeric, alphanumeric, binary and kanji). QR code is the most popular two-dimensional barcode. Its popularity stems from its fast readability, greater storage capacity and versatility. It is used in many areas in trade, tourism, manufacturing and in many others. There are various versions of QR code, which determine its size (capacity) and the number of service marks. Each version can use several levels of error correction (using Reed-Solomon error correction).

Micro QR Codes

Micro QR codes are a smaller type of QR codes: the largest can hold up to 35 digits. They use only one orientation mark, which increases the efficiency of data storage. Thanks to this structure, the size of the Micro QR code grows much less with an increase in the amount of encoded information. Micro QR codes and QR codes support 4 levels of error correction.

QR codes are of the greatest interest for research as they are widespread and have a high degree of error correction; Aztec Codes too, as they can correct up to 95% of damage, and therefore are much better suited for situations where there is a very high probability of barcode damage.

Figure 7 — Micro QR code with an “Adoriasoft” line

How Does a QR Code Work

Let’s take a closer look at the QR code’s structure.

Three squares in the corners of QR code are used to determine the spatial orientation of the barcode. They make QR code readable even if it is rotated or mirrored. Black and white dots of the image are converted into binary code.

Structure of QR code:

Position detection pattern
Timing patterns (alternating black and white pixels)
Mark, black pixel
Alignment pattern appears in large-capacity QR codes, there can be several of them
QR code’s version details
Type of information presented in QR code (numeric, alphanumeric, binary or kanji)

7–8. Two identical copies of system information (code and the error correction level, mask)

Masks are used to generate a QR code with the least amount of identical adjacent blocks. QR-code is a bit matrix containing information and service areas. First, a QR with the original information (encoded with a Reed-Solomon code) is generated. Then 8 different bit masks are generated, the same size as the QR. Then each mask is summed up with the original QR code according to the ‘exclusive OR’ rule (service areas are not modified). Further, the penalty score is calculated for each matrix, taking account of large groups of identical pixels, similarity with position orientation patterns and even distribution. The mask that was used in the matrix with the least penalty score will be used. This procedure is necessary to increase the reliability of a matrix scanning. Mathematically, masks are represented by formulas.

How Does an Aztec Code Work

The Aztec code is similar to a QR code in its structure and information encoding approach, but it has its own features and differences. Key differences of Aztec Code:

The scanner needs a target and additional elements to accurately determine the center of the code, the orientation of the mark, the starting point of the data line, and the size of the character.
Information is recorded from the center outward. Important (service) information is recorded first, and then the most easily recoverable one (encoded message).
The white space around the code is not necessary, unlike in case of QR code. This is due to the target location (Aztec Code has it in the center, and QR code has the orientation marks on the sides), the size of the symbol (bit length) is recorded in the service information that is around the target. This minimizes the number of reading errors since the probability of successful reading of blocks near the target is maximum.
Barcode uses Reed-Solomon error correction. The number of words for error correction can be changed if necessary. This allows to easily change the degree of error correction by simply adding additional symbol layers.

Elements of the barcode:

A target pattern that defines the center (size: 13 х 13)
4 orientation patterns. Used to determine code orientation and reading start point
Symbol description contains 40 bits. It contains information about the number of symbol layers (5 bits), and information blocks (11 bits). The remaining space is occupied by Reed-Solomon codes (24 bits).

All these elements make up a structure called the ‘core’ of 15x15 characters.

In addition, Aztec code may also have a special grid consisting of alternating elements. It is used to increase the reading speed and to reduce errors that are caused by positioning and the correction of layers that are far from the core.

There is also a smaller version of Aztec Code that doesn’t have the grid. Its target size is 9 x 9 and it can have maximum 4 layers. The amount of system information is also reduced: 2 bits for layers’ number determination, 6 bits for describing the amount of data and 20 bits for error correction codes.

Figure 11 — Structure of a small Aztec code

Information is recorded in 5 x 2 blocks.

Data blocks are placed clockwise from the orientation pattern 1. Two 5 x 2 bit blocks are located next to each other, and connected on short sides. Blocks of 1 x 2 bits are indivisible. The block cannot cross the reference grid. Therefore, the block is divided into 2 parts and is located around the grid if it does not fit on one side of its lines.

The information is recorded as follows: first, service information is recorded in the core, then the last message block (the last block of error correction codes) is written around the core. And the last block (on the outside of the code) will be the first message block (first information block). Information is encoded this way because the probability of errors increases with distance from the core.

The reading starts in the core and proceeds to the edges. First, the service part is read, which consists of 2 parts:

The first part determines the number of layers. In regular Aztec codes this information is recorded in the first 5 bits (1 to 32 layers). For small Aztec code — in the first 2 bits (1 to 4 layers). The number of layers is used to determine the symbol size, code capacity and number of bits in one layer.
Second part determines the size of the encoded message. The number of information words in the code is encoded with either 6 bits (0 to 63, small AC) or 11 bits (0 to 2047, regular AC).

Error correction. The size of the service information word is 4 bits, and the correction is performed using codes in Galois field GF (16) with the polynomial x4 + x + 1.

The system information starts in the core 1, then proceeds to the higher bits and then the lower bits.

Figure 12 — Orientation patterns in Aztec code

Data recording

The characters in the message are sequentially converted into binary values according to the ASCII table.
The resulting binary string is split into words of length B minding the ban on all identical bits in the word — D information words are obtained;
Words for error correction (K = Cw — D) are added to the result.
The resulting sequence of D+K words is placed in L layers of the pattern starting from orientation pattern 1.
If after all the data have been encoded, there’s unused space left on the outer layer, then it is filled with 0.

QR Code and Aztec Code Deformation Resistance

Comparison

For the experiment a program was developed using Python programming language. The source code and experiments results can be found on [1]. Generation of QR codes and Aztec codes is done with treepoem library[2]. The reading is done with zxing library[3]. We randomly generated an encoding string, which consisted of letters, numbers and spaces. A new random string was generated for each QR code and Aztec code pair.

Matrix barcodes were subjected to the following attacks:

adding a random value to a pixel (ability to set the percentage of changed pixels);
multiplying the pixel value by a random floating point number (ability to set the percentage of changed pixels);
pixel color inversion (ability to set the percentage of changed pixels);
imitation of defective pixels of white, black or random color (ability to set the percentage of changed pixels, as well as the mode of imitation of defective pixels of white, black or mixed colors);
Gaussian noise (ability to set the percentage of changed pixels);
adding a random value to the pixel value using XOR operation (ability to set the percentage of changed pixels);
cutting out the white or black lines vertically or horizontally (ability to set the width of the bar, white or black, vertical or horizontal);
blur (ability to set the type of blur — median, Gaussian, BoxBlur);
motion blur (ability to set the direction of blur — vertical or horizontal).

To collect statistics, each attack was performed on 1,000 pairs of QR codes and Aztec codes. For the purpose of the experiment, we used matrix codes with 30% redundancy. The charts below reflect the percentage of matrix codes from which the correct data was successfully extracted.

Adding of a random value to a pixel was carried out with different percentages of changed pixels — 0.5%, 1%, 1.5%.

Figure 13 — Results of the attack ‘adding a random value to a pixel’ on QR code and Aztec code

The results show that the Aztec code handled this attack better.

Multiplication of pixel value by a random floating point number was carried out with different percentages of the changed pixels — 1%, 2%, 3%.

Figure 14 — Results of the attack ‘multiplying pixel value by a random floating point number’ on QR code and Aztec code

The color inversion attack was carried out with different percentages of changed pixels — 1.5%, 2%, 2.5%.

Figure 15 — Results of the attack ‘color inversion’ on QR code and Aztec code

The imitation of defective pixels was carried out with different percentages of the changed pixels — 1%, 2%, 3%.

Figure 16 — Results of the attack ‘imitation of black defective pixels’ on QR code and Aztec code

Figure 17 — Results of the attack ‘imitation of white defective pixels’’ on QR code and Aztec code

Figure 18 — Results of the attack ‘imitation of defective pixels of mixed color’ on QR code and Aztec code

The attack using Gaussian noise was carried out with different percentages of changed pixels — 1%, 1.5%, 2%.

Figure 19 — Results of the attack ‘Gaussian noise’ on QR code and Aztec code

A random variable was added to the pixel value using XOR operation. The experiment was carried out with different percentages of the changed pixels — 1%, 1.5%, 2%.

Figure 20 — Results of the attack ‘XOR’ on QR code and Aztec code

The results of the research show that Aztec code better handles distortions aimed at modifying individual pixels. It performs better even when using the same level of redundancy.

Cutting out of the vertical line of 10, 15, 20 pixels.

Figure 21 — Results of the attack ‘cutting out a vertical line’ on QR code and Aztec code

Cutting out of the horizontal line of 10, 15, 20 pixels.

Figure 22 — Results of the attack ‘cutting out a horizontal line’ on QR code and Aztec code

Using vertical and horizontal motion blur, the following results were obtained for QR code and Aztec code (Figure 23 and Figure 24). The charts below reflect the percentage of matrix codes from which the correct data was successfully extracted. For median blur (Figure 25), for BoxBlur (Figure 26).

Figure 23 — Results of the attack ‘vertical motion blur’ on QR code and Aztec code

Figure 24 — Results of the attack ‘horizontal motion blur’ on QR code and Aztec code

Figure 25 — Results of the attack ‘median blur’ on QR code and Aztec code

Figure 26 — Results of the attack ‘BoxBlur’ on QR code and Aztec code

The results of the experiment show that Aztec code handles distortions aimed at modifying groups of pixels much worse. QR code shows the best results for blur-related distortion.

Bottomline

The research shows that Aztec codes handle distortions targeting specific pixels better. Aztec codes are more successful in resisting the following attacks:

adding a random value to a pixel;
multiplying pixel value by a random floating point number;
pixel color inversion;
imitation of defective pixels of white, black or random color;
Gaussian noise.

However, QR codes showed better resistance to the following distortions:

cutting out white or black lines vertically or horizontally;
blur (median, Gaussian, BoxBlur);
motion blur.

It can be concluded that the choice of the type of matrix barcode should be based on the features of system in which it will be used and the data transmission channel in view of the attacks and random data distortions typical for it. This will significantly increase the efficiency of the system being developed.