Quickly: Encoding, Decoding, Encryption, Decryption & Hashing

A quick introduction to encoding, decoding, encryption, decryption and hashing.

Piyush Kochhar

Published in

The Startup

7 min readAug 2, 2020

Encoding

Encoding is a process of conversion of data from one format to another. It includes the utilization of a code to change unique information into a structure that can be utilized by an external application.

The code utilized for changing over characters is called American Standard Code for Information Interchange (ASCII), the most commonly utilized encoding for files that contain text.

Media files such as audio and video use encoding to reduce their file size. Each audio and video file format has a corresponding coder-decoder (codec) program that is used to code it into the appropriate format and then decodes for playback. For example, converting .mp4 into .avi, .flv to .mp3 etc. would require encoding.

Let’s look at type of encoding: base64

Base64 Encoding:

It is an encoding technique used to convert binary data to an ASCII format. When we need to send binary data over media that are usually designed to handle textual data, we need to use base64. Example, sending images in an XML document or as an email attachment.

The following figure is the base64 character index table, which is used to encode characters into base64.

Let’s see some few examples:

1) And

Algorithm for base64 encoding:

Convert ASCII code into octets
Convert octets into sextets
Convert sextets into decimals
Convert decimals to their equivalent characters in base64 character index table

Note: The total characters in base64 encoded text is always a multiple of 4.

echo -n “And” | base64QW5k

2) An

Note: The cells in the red are padded 0’s to complete the sextet.
And since the total characters in the output should be multiple of 4, we pad the output with an extra ‘=’

echo -n “An” | base64QW4=

3) A

echo -n “A” | base64QQ==

Decoding

Decoding is the opposite of encoding. It is a process that converts an encoded code format back to its original code format.

Like in the encoding section above, the base64 encoded output can be converted back to its original text by reversing the steps.

Algorithm for base64 decoding:

Convert base64 characters to their equivalent decimals index from the base64 index table
Convert decimals into sextets
Convert sextets into octets
Convert octets into ASCII code

Encryption

Encryption is the process of encoding information so that only the users that have authorization can access the data. Encryption does not itself prevent interference and eavesdropping attacks, but denies intelligible content to the interceptor.

Encryption Algorithms are of two types:

Symmetric-key encryption

In this encryption technique, the encryption and decryption key are the same. The key is sometimes referred to as a shared secret because the sender or computer system doing the encryption must share the secret key with anyone authorized to decrypt the message. E.g. Advanced Encryption Standard (AES)

Public-key encryption:

In this encryption technique, the encryption and decryption keys are different. The encryption key is published for anyone to use and to encrypt data, while only the receiving user has access to the decryption key that decrypts the encrypted data. E.g. The Rivest-Shamir-Adleman (RSA)

Other Examples: Triple DES,, Blowfish, Twofish etc.

Decryption

Decryption is a process of converting encoded/encrypted data back to its original form. This method is performed by decrypting the text manually or by using keys used to encrypt the original data.

Hashing

Hashing is a process of converting data through a function, which results in an output of a fixed length. Using a fixed-length output increases security since anyone trying to decrypt the hash won’t be able to tell how long or short the input is simply by looking at the length of the output.

Example:

MD5 (Message Digest 5) Hashing Algorithm

It is an algorithm that takes in an input message (plain text) of arbitrary length and produces an output of length 128 bits called message digest (MD).

This algorithm is usually used for authentications.

Checking integrity of MD

A 128bit MD is generated by source.
MD is appended to plain text.
The MD is send to the receiver.
Both the source and receiver MD will be compared. If they match then we can say our data was safely encrypted and was not changes during the process.

MD5 Algorithm:

1) Append Padding bits: The input message is padded so that its length equals to 448 mod 512. Padding is always performed even if the message is already 448 mod 512.
Padding is performed as follows: a single “1” bit is appended to the message and then “0” bits are appended so that the length in bits of the padded message becomes congruent to 448 mod 512. At least one bit and at most 512 bits are appended.

2) Append Length: A 64-bit representation of the length of the message is appended to the result of step1. If the length of the message is greater than 2⁶⁴, only the low-order 64 bits will be used.
The resulting message (after padding with bits and with b) has a length that is an exact multiple of 512 bits. The input message will have a length that is exact multiple of 16 (32 bit) words.

3) Initialize MD buffer: A four-word buffer (A, B, C, D) is used to computer the message digest. Each of A, B, C, D is a 32bit register. These registers are initialized to the following values in hexadecimal, low order bytes first)

 word A: 01 23 45 67
 word B: 89 ab cd ef
 word C: fe dc ba 98
 word D: 76 54 32 10

4) Process message in 16-word blocks: Four functions will be defined such that function takes an input of three 32-bit words and produces and produces a 32-bit word output.
F(X,Y,Z) = (X and Y) or (not(X) and Z) [Round1]
G(X,Y,Z) = (X and Z) or (Y and not(Z)) [Round2]
H(X,Y,Z) = X xor Y xor Z [Round3]
I(X,Y,Z) = Y xor (X or not(Z)) [Round4]

5) Each round has 16 steps of the form:

A ← B + ((A + Function (B, C, D) + x[ ] + T[i]) <<< s)

where A, B, C, D refer to the 4 words of the buffer. But used in varying permutations and F (B, C,D) is different nonlinear function in each round (1,2,3,4). i represents different constants 16 constants every round.
i will be T[1–16] for 1st round, then T[17–32] for round 2, then T[33–48] for round 3 and finally T[49–64]. X[ ] is a part of the original message. s is the circular left shift s bits.
That means for 2nd round we get:

B ← C + ((B + Function(C,D,A) + x[ ] + T[17–32]) <<< s)

The following figure sums up the MD5 algorithm:

In the above figure gives the overview of the Md5 Algorithm. Here, Message is converted into x[ ] parts for each buffer( A,B,C,D) and is processed with the Functions F, G, H, I and the constants from T[1–64]( which are divided into 16 constants for each buffer). Then we perform “addition modulo” on the output of these buffers and the initial buffer values. And hence, we get our message digest. If the length of the message is > 512 bits i.e. 1 block then, we again perform “addition modulo” on the MD output and initial buffer values for the next block and so on until we process last block.

Other Examples: SHA-1, SHA-2, SHA-3, SHA-256 etc

$echo -n Piyush | md510f95896a13c59b88e4c0e837642fb3c

Encoding vs Encryption vs Hashing

Encoding: A technique for maintaining data usability and can be reversed by employing the same algorithm that encoded the content, i.e. no key is used.
Encryption: A technique for maintaining data confidentiality and requires the use of a key (kept secret) in order to return to plaintext.
Hashing: A technique for validating the integrity of content by detecting all modifications via obvious changes to the hash output.