Hashing-Part 1

Sunilkathuria
In Computing World
Published in
6 min readMay 4, 2024

Introduction

When you transfer a file over a network, how do you ensure that the file is not corrupted during the transfer?
How do you ensure that a message you have received over the internet has been sent by an authentic user and not an impersonator?
How do you ensure that the passwords you save in a database are protected from prying eyes?
How would you store data in neatly designated buckets and make adding and retrieving information easy?

Hashing is used to address these varied questions.

What is hashing

Hashing is a concept in Cryptography and Computer Science. In this, a mathematical algorithm (a hash function) accepts text input and gives out a fixed-size output (a hash value). Other names of this output are digests, hash codes, or hashes.

Following is a high-level process of hashing

a. Input data: This data can be of any normal text or binary. It can be of any length.
b. Hash function: This mathematical algorithm accepts this input and does several computations on this data.
c. Hash value: This is the output of a hash function. A fixed-size value. Generally, a hexadecimal value.

Characteristics of a hash function

  • Hashing is one-way. This means we cannot use the hash value to convert it back to data.
  • A hash function must be deterministic. This means its output MUST generate the same value for a given input.
  • Exhibit the avalanche effect. A hash function must generate different or significantly different hash values when there is only a 1-bit difference between two input values
  • Minimize or avoid collision. A collision occurs when two separate inputs generate an identical hash value. A good hash function offers a very low risk of collision.
  • A hash function should be fast at calculation.

Application of hashing

Broadly, we can divide hash functions into two categories based on their usage. The first is for cryptography, and the second is for computer science.
In the case of cryptographic functions, these must show the characteristics mentioned above and add some additional security features that make it difficult to break the algorithm. These are generally used for password verification, verifying messages, detecting data corruption, generating digital signatures, etc. Some of the examples are SHA (Secure Hash Algorithm), MD5 (Message Digest 5)

To play around with the hash functions, refer to this informative site

In the case of computer science, these hash functions are generally used to generate hash values for hash tables to manage data in a dictionary-like structure. Or for error detection during data transfer. These functions do not demand stringent security features. These are fast and less resource-hungry. Some of the examples are CRC (Cyclic redundant check), MurMurHash.

Implementing a Hash function

Consider a situation in which we store a lot of data in an array. As the size of an array grows, data retrieval takes more time. To speed up this process, we decide to implement a “Hash table”. This hash table has an array of five buckets/slots.

We will write a hash function to calculate the hash value of given data. Then, we will use this hash value to generate the index of one of the buckets and distribute this data in buckets.

The following is the code of a hash function. This function receives an argument and calculates its hash value. In calculating the hash value, the function converts the argument to a “utf-8” byte stream and then iterates through each byte to calculate the hash value.
The hash value is calculated by XORing each byte to its previous bytes.

A simple hash function

Characteristics of a hash function

In this section, we will evaluate the characteristics of the above-mentioned hash function. A better understanding of these will help us choose an appropriate hash function.

“One-way” test

The hash function does not perform very well on this front. A one-way test suggests that generating the data value from the given hash value should not be possible. The following code snippet tests the hash function for “one-way.”

Output of the function “test_one_way()”

If you observe the output, the function generates the hash code in sequence, equivalent to its ASCII code, making it easy to guess the input value from a given hash value.

Avalanche effect test

If we observe the output of the function test_avalanche_effect() here, we see no significant difference in the hash code of two strings. The avalanche effect suggests that, with two similar strings, the hash function should generate a significantly different hash value. So, this function also fails the avalanche effect test.

Output of the function “test_avalanche_effect()”

Collision test

With this hash function, it was very easy to find a collision. A collision happens when two different input values generate the same hash value. The function hash_v1 generated an identical hash value of 167 for the input strings “NCN” and “IttuELv” (these two strings were generated at random). So, from the collision point of view, this function is not a very good choice. Refer to the following code snippet used to test the collision.

The above function generates a random string ranging from 3 to 15 characters. This random string is then passed to the function hash_v1 to generate a hash value.

Speed test

The answer to “Is this function fast enough?” is “It depends”. When I executed the following code snippet, it took around 29 seconds to complete. During this time, the function generated fifteen million random strings ranging from 3 to 15 characters and calculated the hash value of each random string.

For a large amount of data, the program took around 27 seconds to generate the hash value for 32 MB of data. (These numbers may differ on your computer). Refer to the code snippet below.

Following is the summary of the tests of the hash function hash_v1()

Summary

  • We understood the meaning and characteristics of hashing.
  • Wrote a simple hash function and understood how it fares against the characteristics of a typical hash function.

References

Videos of the article…

Source code in the articlehttps://github.com/incomputingworld/hahsing

Coming up…

In the next article, we will use this hash function to implement a hash table and see how this fares.

Did you enjoy this article? Please clap it and share it with your friends.
If you have found this article helpful, please subscribe to read more articles.
Feel free to comment!!

--

--