Decoding a Broken QR Code

.r00
9 min readSep 25, 2017

--

While working on season 3 of the Mr. Robot ARG, we discovered a partially obfuscated QR code. I’m the type to dig too far into everything so I set out to learn how to manually read a Quick Reply code. In the process, I picked up a ton of great information and would like to share it. I should warn that the following can be dense and dull but if you’re interested in the how and why behind QR codes, it’s worth the read.

This is the original QR code which can be found here. Easy enough right? You grab a QR code reader app and snap way! I thought the same and had to learn the hard way that it isn’t the case. This code is broken. As you can see the e-coin logo in the center is blocking a valuable section of the QR code making it unreadable. Some QR codes can handle this but they have to be built to do so. In our case, the ARG creators have intentional hidden parts of the code to offer up a challenge. One that I happily accepted.

The first step in decoding the QR code was to bring it into Adobe Photoshop and recreate it digitally. Although, you could just as easily do this on paper if you dont have imaging software. Even with great care, I wasn’t able to get the entire code replicated due to the logo obstructing the center. This section has been left empty as it is impossible to know what values they hold. If your QR code hasn’t been sabotaged by the likes of KorAdana or 1o57, then carry on.

Colors, oh the colors. QR codes are comprised of several different types of data and they all play an integral part in the function. I will dig into each section to discuss what they hold.

The orange section is used for alignment. A QR code’s data is layed out in a very particular manner which means that when decoding one, you need it oriented so that the smaller alignment square is in the lower right corner. Technically speaking, the larger squares are called timing patterns and only the smallest is considered an alignment pattern but I won’t tell if you don’t tell.

The blue and red sections combine to makeup the two identical 15 bit sets known as the format info. There are two sets in case one gets damaged and only one is needed to decode. Of the 15, bits 09 through 00 are insignificant and can be discarded. Bits 14 to 10 makeup the actual format marker. This is read in binary with black and white representing 1 and 0 respectively. In binary, our format marker reads 11010. However this can’t be read as is. It has been XOR’d with 10101 which needs to be reversed before reading. At it’s most basic idea, it means that we need to invert the first, third, and fifth bits. If we do this, our format marker becomes 01111.

The first two bits (01) are again for error correction and can be discarded. The other three (111) are what we want. If we compare our mask of 111 to the table here, we can see that we have the mask type in the bottom right. More on masking in a bit.

The yellow here shows where the actual data is stored. This could be an email address, snapchat username, phone number, or whatever else you want so long as it is short enough. This data block also contains more information about the code itself with things like the encode type, length field, end block, and error correction blocks. It’s critical to know that much like our format markers, this data has been XOR’d with the mask we identified earlier.

Now comes the masking. This is by far the most tedious part of reading a QR code but so long as you’re patient, you’ll come out the other side. In essence, we take the mask we identified earlier and paint it over the entire QR code starting from the top left. This mask indicates which bits need to inverted to reveal the actual data of our QR code.

Why the hell do we even need to mask QR codes? Computers aren’t smart. They do what we tell them to do. If we tell a QR reader to identify a single module, how does it know that 4 black modules in a square aren’t just 1 really big module? Set aside the fact that we have the alignment and timing patterns for a minute. Enter masking. Sometimes, the data that a QR code displays isn’t pretty. It could have massive blotches of solid white or black and that can confuse computers. Masking allows us to step around that problem by ensuring we dont have those ugly spots.

We dont need to apply the mask to the entire code, just the data bits. Here I’ve removed the portions of the mask that we wont be using. To be very clear, I started with the pattern covering the entire qr code and then simply erased the parts that were covering non-data bits. Nothing was moved, just deleted. Then, we step one bit at a time over the entire data section and invert any bits covered by a black module on the mask. For example, look at our bottom right bit of the original QR code. It’s black so that represents a 1. Now look at the corresponding module of our mask which is also black. That means that we need to invert this bit to become a 0 now.

If we apply this process to the all of the data bits, we get what you see here. In the beginning of this article, we had a gaping hole in our QR code. Notice how applying the mask has changed that. Because we started with no data in those spots, we still have no data in those spots. But, I’m trying to illustrate how the mask adjusted things to prevent massive areas of like bits. Now, the QR code is ready for us to read. We still don’t quite know how HOW to read it so let’s do that. First, we need to take a look at two more pieces of important information.

The green and purple are covering the encoding type and length field respectively. The encoding type tells us what type of QR code we are dealing with. There are three main types and they are numeric(0001), alphanumeric(0010), and 8bit byte(0100). Our encoding type bits read in a zig-zag pattern from bottom right to top left as 0100 in binary. Remember, black = 1 and white = 0. This code of 0100 tells us that we are dealing with an 8bit byte type QR code.

In order to read our data, we need to know how long each byte is. We use the previously discovered encoding type to learn how long our byte is. The lengths are numeric (10 bits), alphanumeric (9 bits) and, 8bit byte (8bits). Since we know we have an 8bit byte style code, we know that our length field is 8 bits long. We use that when reading the purple section. This section contains the length field and is read the same direction as the encoding type. We start in the bottom right, step left, and then up to the next row making sure to zig-zag. That means that our bit length reads in binary as 00100000. This is then converted to decimal as 32. It means that our QR code can handle up to 32 bytes of binary information.

In the event that you haven’t had a blast up to this point, now comes the fun stuff. We get to actually start reading the data of our QR code. This data gets read in a rather unusual way but the image here should help clarify. Starting just above the length field, we read moving vertically in 8 bit chunks. Once we bump into the format markers, we turn left and then head down the colum just beside what we previously read. Once we reach the bottom, we turn left again and head upward. This repeats until we reach the end of our data. This process of reading be really tricky. I recommend converting each byte you read as you read it to ensure that you haven’t accidentally missed something. If you start getting junk output, you might have messed up.

That being said, let’s take a look at our first byte of data. In binary, it reads 11100001 which when converted to ascii becomes the letter “h”. See, wasn’t that fun? Doesn’t it make the last thirty minutes of reading all the more worth it? Now that we have our first byte of information, we can follow the path in the previous image upward and read the next block of 8 bits. I’m not going to force you to read through me converting each byte. But know that when all is said and done, this QR code (including the bogus data in the midde) converts to binary to be…

“01101000 01110100 01110100 01110000 00111010 00101111 00101111 01110111 01110111 01110111 00101110 01100101 00101101 01100011 011001110 01110010 01110000 00101101 01110101 01101111 11000000 10101110 01100011 01101111 01101101 00100011 10110001 00111011 01101111 01101001 01101110 00101000 11011100 10000110 10101011 11100111 10011111”

And finally, when this binary gets converted to ascii we get…

“http://www.e-cgrp-uoÀ®com#±;oin(܆«çŸ”

Hardly a proper link, I know. But, we can use some intelligent guessing to get what we want. In the world of the Mr. Robot ARG, we know of the site http://www.e-corp-usa.com/ so it stands to reason that the first part of our code, when accounting for the missing data, could be exactly that. The last part is a bit tricky but we do have the “oin”. The Mr. Robot showrunners have been doing a ton of marketing lately for E-Coin and that seems preeeeetty close. As it turns out, our QR code takes us to http:www.e-corp-usa.com/ecoin/! We had to make a few leaps but we found what we needed even though a large portion of the data was missing. Obviously, this would have been much more difficult had we not had some clues on the context of the URL but it’s still cool in my book.

Thankfully for you, that’s all I have about manually reading QR codes. I hope you learned something. I hope you are inspired to go try reading your own QR codes. Most importantly, I hope you have even the tiniest desire to go learn more about something that puzzles you.

.r00__

--

--