Understanding Base64 encoding

Partha Pratim Nayak
5 min readJan 20, 2022

--

Photo by Hope House Press — Leather Diary Studio on Unsplash

In this short writeup, we will discuss encoding ASCII data as bytes and base64 encoding these bytes. We will also cover base64 encoding for binary data and decoding to get back to the original input.

ASCII data

In ASCII, each character turns into one byte:

  • A is 65 in base 10, and in binary, it is 0b01000001. Here, we have 0 in the most significant bit because there’s no 128, then we have 1 in the next bit for 64 and 1 in the end, so you have 64 + 1=65. 0b0 is used to express binary literal. 0b0 = 0 in binary = 0 in base‑10
  • The next is B with base 66 and C with base 67. The binary for B is 0b01000010, and for C, it is 0b01000011.

The three-letter string ABC can be interpreted as a 24-bit string that looks like this:

These blue lines show where the bytes are broken out. To interpret that as base64, we need to break it into groups of 6 bits. 6 bits have a total of 64 combinations, so we need 64 characters to encode it.

The characters used are as follows:

We use the capital letters for the first 26, lowercase letters for another 26, the digits for another 10, which gets you up to 62 characters. In the most common form of base64, you use + and / for the last two characters:

ASCII to Basse 64 encoding

If we have an ASCII string of three characters, it turns into 24 bits interpreted as 3 groups of 8. If we just break them up into 4 groups of 6, we have 4 numbers between 0 and 63, and in this case, they turn into Q, U, J, and D. In Python, you just have a string followed by the command:

This will do the encoding. Then add an extra carriage return at the end, which neither matters nor affects the decoding.

Note

The = sign is used to indicate padding if the input string length is not a multiple of 3 bytes.

If we have four bytes for the input, then the base64 encoding ends with two equal signs, just to indicate that it had to add two characters of padding. If we have five bytes, we have one equal sign, and if we have six bytes, then there are no equal signs, indicating that the input fits neatly into base64 with no need for padding. The padding is null.

We take ABCD and encode it and then we take ABCD with an explicit byte of zero. x00 means a single character with eight bits of zeros, and we get the same result with just an extra A and one equals, and if we fill it out all the way with two bytes of zero, you get capital A all the way. Remember: capital A is the very first character in base64. It stands for six bits of zero.

Let’s take a look at base64 encoding in Python:

This has two equals signs because we started with four bytes, and it had to add two more to make it a multiple of three:

With a five-byte input, we have one equal sign; and with six bytes of input, we have no more equal signs, instead, we have a total of eight characters with base64.

Let’s go back to ABCD with the two equals signs:

We can see how the padding was done by putting it in explicitly here:

>>> “ABCD\x00\x00”.encode(“base64”)

‘QUJDRAA=\n’

There’s the first byte of zero, and now we get another single equals sign.

Let’s put in a second byte of zero:

>>> “ABCD\x00\x00”.encode(“base64”)

‘QUJDRAAA\n’

We have no padding here, and we see that the last characters are all A, indicating that there’s been a filling of binary zeros.

Binary data

The next issue is handling binary data. Executable files are binary and not ASCII. Also, images, movies, and many other files have binary data. ASCII data always starts with a zero as the first bit, but base64 works fine with binary data. Here is an executable file; it starts with MZê and has unprintable ASCII characters:

As this is a hex viewer, we see the raw data in hexadecimal, and on the right, it attempts to print it as ASCII. Windows programs have this string at the start, ”This program cannot be run in DOS mode”, but they have a lot of unprintable characters, such as FF and 0, which really doesn’t matter for Python at all. An easy way to encode data like that is to read it directly from the file. We can use the with command. It will just open a file with filename and mode read binary with the file handle and then we can read it. The with command is here just to tell Python to open the file, and that if it cannot be opened due to some error, then just to close the handle and then decode it exactly the same way. To decode data you’ve encoded in this fashion, you just take the output string and put .decode instead of .encode.

Now let’s take a look at how to handle binary data and encode an image file

Here we enter the filename first and then the mode, which is read binary. We will give it the filename handle of f. We will take all the data and put it in a single variable data. We could just encode the data in base64, and it would automatically print it. If you have an intended block in Python, we have to press Enter twice so it knows the block is done, and then base64 encodes it.

We get a long block of base64 that is not very readable, but this is a handy way to handle data like that.

Decoding from Base64

--

--