HASH vs. ENCODING

In this article, we will discuss the following crypto-functions:

It’s easy to get confuse with these functions. Function-wise, they came no close in common at all.

In simple sentence, hash is one-way function that mix & mash your data (in bytes) like mash potatoes. The only way to reverse it is via brute-force attack. An example of md5-hashing:

fmt.Print(md5hash(“i”)) //865c0c0b4ab0e063e5caa3387c1a8741

On the other hand, encode represent your string of bits (in n length) into human-characters, often a constructed table that maps between bits to human-character. i.e:

ASCII Table
bits     => human-defined character
01000001 => 'a'
01000010 => 'b'

Hash Function

Below are the commonly-known hash functions (often built-in into programming languages):

  • md5
  • sha-1,
  • sha-128
  • sha-256

Trivial

  • From the list, in sequence, each function is getting more secure, and longer fixed-length output.
  • Its output can contains non-printable characters. Hence, its not uncommon to see a hash function followed by encode function, referencing to a printable-character-set (ASCII-encode is out of the list). For instance, A 16 byte md5 hash is often hexadecimal-encoded to 32 printable-char. (2 chars per byte)
  • It destroys the original identity. An analogy: face-surgery.

1. MD5

  • fixed-length output: 128 bits = 16 bytes = 32 hexadecimal-chars (if you want to print).
  • vulnerable to brute-force attack.
  • application-1: file-checksum (vulnerable to content-tempering: others might attempt to produce the same checksum with different content)
  • collision rate: 2⁶⁴.
  • In short md5 is safe for non security purposes, but broken in many security applications.

2a. SHA-1

  • fixed-length output: 160 bits = 40 hexadecimal-chars.

2b. SHA-256

  • fixed-length output: 256 bits = 64 hexadecimal-chars.

Encoding Function


Below are commonly-used encoding function, also could be found built-in to programming languages:

  • ASCII.
  • BASE64.
  • HEXADECIMAL.
  • utf-8.

Trivial

  • In a nutshell, it translates string of binaries (010101) into human-defined symbols (alphabetical, numerical)
  • Each encoding function requires a reference to character-set.
  • A character-set is a mapping of “n-bits”(binary) to human-defined symbols.
  • Often used in readability/representation, data-transmission, etc.
  • The original identity remains. It was merely a conversion into another form. An analogy: WATER->ICE->GAS

1. ASCII

  • 8bits - 1 character. (1 char per byte)
  • can contains non-printable characters.
  • An ASCII-character-set has 256 possible entries, binaries range from 00000000–11111111.
The output-character-length == input-bit-length/8. [128bit/8 = 16 characters]

2. BASE64

  • 6bits — 1 character. (1 char per 6 bits)
  • only contains printable characters.
  • An BASE64-character-set has 64 possible entries, binaries range from 000000–111111.
The output-character-length = input-bit-length/6. [128bit/6 = ~22 characters]

3. HEXADECIMAL

  • 8bits — 2 characters. (2 chars per byte)
  • only contains printable characters.
  • An HEXADECIMAL-character-set has 16 possible entries, binaries range from 0000–1111.
The output-character-length = input-bit-length/4. [128bit/4 = 32 characters]

Appendix

Generate Random Hash

  • [Step 1] Generate random bytes[n-length].
s := randomBytes(10) 

We obtain random bytes/characters of length n. The bigger the length n, the less likely it will collide. But it also meant longer processing-time in later stage.

  • [Step 2] Encode to base64. (If you want to represent/print)
e := base64(s) 

Note: base64 encoding is NOT a way of encrypting, nor a way of compacting data. In fact a base64 encoded piece of data is 1.333… times bigger than the original data-piece. It is only a way to be sure that no data is lost or modified during the transfer.

  • [Step 3] Hash it.
h := hash(s)

References

  1. http://stackoverflow.com/questions/10315757/what-is-the-real-purpose-of-base64-encoding
  2. https://codahale.com/how-to-safely-store-a-password/
One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.