Encryption vs Encoding vs Hashing
They might appear in the same context, but they are vastly different
Encryption is about keeping a secret and being able to restore it. Hashing is about fingerprinting — you don’t need to restore the original, but you need to make sure it is identical. Encoding is about data representation to enable information exchange. Encoding does not involve keeping secrets.
This was my Twitter-length explanation. Let’s dive into details!
Encoding
Encoding is about data representation. For example, for icons on the web, we prefer not to store image files but have them directly on the web page. This prevents the client from creating many HTTP requests for little data.
But then the binary data of the image has to be converted to text data. A common way to do that is base64 encoding.
As you can see, the “translation” is trivial: We build 6-bit blocks of the binary data and look up the character in the table above. It’s called base64 because there are 2⁶ = 64 digits. Hence it can be interpreted as a number base conversion.
Please note that this does not keep the content secret. Base64 does not use a secret key. Hence it is not encryption.
Character encodings are also extremely common. They map an integer to a character. The three which I stumble over most often are UTF8, ASCII, Latin1.
Encryption
Encryption is about keeping secrets. You don’t want to keep the method how you encrypt and decrypt secret. Instead, you should have a secret key that is necessary to decrypt. This is called Kerckhoffs’s principle.
Mathematically speaking, you have two functions:
encrypt(plain text, key) -> cipher text
decrypt(cipher text, key) -> plain text
This is called a symmetric-key algorithm as you use the same key for encrypting and for decrypting. There are asymmetric-key algorithms as well, but this would go too far.
A very early scheme to encrypt was to use a natural sentence as a key to encrypt single characters. If the sentence happens to have one character multiple times, the second and following characters are deleted:
Sentence : The quick brown fox jumps
Derived key: the quickbrownfxjmpsadglvyz
Then you translate character by character:
Original : the quickbrownfxjmpsadglvyz
Translates to: abcdefghijklmnopqrstuvwxyz
Note that “space” translates to “d” and “z” translates to “space”.
Putting it all together, the message secret
translates to tchkca
.
In Python, it looks like this:
Modern encryption algorithms are a bit more complicated. The state of the art is AES — the Advanced Encryption Standard. Notable mentions are Twofish, Serpent, SM4, and SEED.
Hashing
Hashing is about fingerprinting. You want to be able to uniquely identify a list of bytes (e.g. a string or a file), but you don’t want to store it. You either don’t need to be able to go back to the original or you don’t even want it. Just like with a fingerprint: You can take two fingerprints and conclude that they belong to the same person. But given only one fingerprint, you cannot reconstruct that person.
I wrote an article that explains in detail why this loss of information is desired:
But there are several other applications of hash functions as well:
State-of-the-art hash functions are SHA-256 or SHA-512.