Decoding Base 64
There are a lot of Binary to text encoding schemes. These encoding schemes define a certain way you could represent your binary data in a textual format. See the list of all available binary-to-text encoding schemes on wikipedia.
Base 64 is one of those text encoding schemes. It basically changes the base of your binary data from 2 to 64.
But why Base 64, why not Base 32 or Base 16(also called hexa-decimal). Well, turns out, efficiency of Base 64 is more than Base 16 or 32. Also, you can safely assume that if you save your Base 64 encoded data on a file today, you can decode it tomorrow in whichever programming language you prefer, since all programming languages will have all the 64 characters that form base 64 character set.
So, you might ask what’s the problem with binary data. Let me answer this via an example. Suppose you have lot of binary data that you need to read in C/C++. You load that data as string in your program. Well, turns out the data had few NULL characters in between. a NULL character has all bits as o, as in 0x00 in hexadecimal.
when c++ encounters that NULL character, it’ll assume that the string has finished. Well, may might be able to solve this problem using some c++ way of reading binary, but you get the point. You can not simply read that binary data as is without having to handle the NULL situation.
Similarly, when you are transferring the same binary data over the wire, you never know if the underlying protocol will treat it as a control instruction and execute the binary on it’s own. The protocol might change your data as well if it sees that you have entered some special character combination in binary.
So, if you encode your binary data into characters, it’ll solve the problem for you. Base 64 does just that.
So, how it works then. Let’s check the Base 64 table first.
Base 64 has it’s own encoding, like 001111 in binary is P in Base 64 encoding. Let’s go through an example and see how we can encode some binary data into Base64 format. Let’s say the data we need to convert into base 64 format is “=\0ab” where \0 is our NULL character.
Step 1: Convert to Decimal
convert each character to it’s ASCII equivalent. a = 97, b = 98 and so on. you’ll get 61, 0, 97, 98.
Step 2: Change to binary
Change each number into it’s binary equivalent
00111101 - 00000000 - 01100001 - 01100010
Step 3: Separate into 6 bit groups
Instead of 8 bit groups, separate it into 6 bit groups now.
001111 - 010000 - 000001 - 100001 - 011000 - 10
Step 4: Add any padding if necessary
Since the last 2 bits are lonely, add relevant number of 0’s at the end to complete the group. 4 in this case.
001111 - 010000 - 000001 - 100001 - 011000 - 100000
Step 5: Convert binary to Base 64
Seeing the Base 64 Table, I can convert this into equivalent Base64 characters now. For eg, 001111 is P in Base 64.
P Q B h Y g ==
Now, this is my Base64 “String”. Note that I added two == at the end of this string to signify that when converting to 6 bit groups, I was short by 4 bits at the end and hence, I added 4 zeros. Each “=” in Base 64 stands for two 0 bits and these are called padding bits. It’s not mandatory to add “=” in the end, since they don’t help in decoding. I could infer the length from missing bits, but the padding might be helpful, when you are concatenating multiple Base64 encoded strings together.
All this encoding can be done in Java in one single line
new String(
Base64.getEncoder().encode(
"=\0ab".getBytes(StandardCharsets.UTF_8)
)
);// Another way to encode using byte array directly
new String(Base64.getEncoder().encode(new byte[]{61, 0, 97, 98}));
I can transfer this string over the network and the other person can Base 64 decode it to get byte array {61, 0, 97, 98} following the exact same procedure in reverse order. It’s also a one liner in Java
Base64.getDecoder().decode("PQBhYg==")
There’s space concern when you convert your binary data to base 64. You’ve essentially taken 3 “8 bit words” and converted them to 4 “6 bit words”. You see, your computer understand only 8 bit words, so it’ll say “ok let me add two 0’s at the start, ahh it’s the character P in the starting”. You’ve increased the size by roughly 1/3 the original size. That explains 75% space efficiency in Base 64.
So, you see, there’s nothing special about Base 64 encoding format apart from what it’s used for. Thanks for reading.