What is Hashing and how it is useful in recent technologies?

Devanshu Dalal
9 min readJun 1, 2022

--

Authors:- Devanshu Dalal, Yash Gaherwar, Tanishq Deshpande, Avinash Dhakne

What is Hashing?

Hashing

Hashing is a technique by which we convert a given set of strings into another value. This value is either a shorter fixed length or something that represents a key and makes it easier to find or verify the original string.

The basic example of this that we all have used is the Hash table in our Data Structure subject. As we all know, even hash table store values as key-value pairs. However, the most important property of a hash table is that each key should be unique.

For converting a string of characters, we use a Hash Function that works on a hashing algorithm and generates new values that we call hash values, hash or also called as digest.

One important property of the hash function is that if we hash a unique string of characters, we always get the same hash.

There is always a need for a good hash algorithm that does not allow converting the hash back into its original value, so the need for a one-way hash algorithm arises.

Collisions in Hashing:

Collision in Hashing

Hashing is a very important thing. As we discussed, it is a one-way algorithm. But the problem could arise if two different keys generate the same hash. This is called a collision. Thus, a good hash function is the one that comes with an extremely low risk of producing collisions.

Hashing Algorithms:

Hashing Algorithm

Some popular hashing algorithms are:

MD5 stands for Message-Digest algorithm 5. It was a very popular and widely used algorithm founded in 1991 but was later found insecure due to collisions being generated even on ordinary computers.

SHA-1 stands for Secure Hash Algorithm 1, and was designed in the year 1995. It produced a 160-bit (20-byte) hash. This algorithm was later discovered to be insecure against well-funded attackers who have got access to Cloud-Computing. In 2017, security researchers proved a practical collision against SHA-1 by producing two different PDF files that have the same SHA-1 signature.

SHA-2 stands for Secure Hash Algorithm 2. It produced a 224-bit hash. This algorithm is complicated, but it is still considered safe.

SHA-3 stands for Secure Hash Algorithm 2. It is the successor of SHA-2.

Hashing vs Encryption

Hashing is a one-way function while Encryption works both ways

Hashing vs Encryption

An encryption algorithm takes an input and a secret key and generates a random output, which is called ciphertext. But the important point is that this operation is reversible. That means if any hacker gets the secret key, he can decrypt the ciphertext and get the original input.

Hashing is always preferable because even in the case of compromise, attackers/hackers won’t get access to plain text or password, but they will get a hash which is of no use to them.

Now let’s discuss the application of Hashing:

1. ) Password verification:

Let’s understand this with an example, We are all using G-mail.

So, we would know whenever we forgot our password. G-mail gives us an option to reset our password by sending a reset password link to our recovery email, but it never tells us our password. Have you wondered why?

The reason is that G-mail uses hashing. So basically, G-mail doesn’t store our password but instead stores the hash of our password, and as we just discussed, hash are never converted back to their original value, so it sends us the link to reset the password but doesn’t let us know the forgotten password.

Now, we would be wondering if it doesn’t store our password, how does it validate us during our login?

So, whenever we try to login, our password that we enter is passed to the hash function, which generates a hash and then this hash is verified against the hash stored in the database. If both the hash match, your login is successful, else you get a message saying “Entered Password is invalid”.

2.) HashMap:

HashMap

We all know HashMap, but let’s see how it works internally. The most important property of HashMap is that every key in HashMap should be unique.

HashMap is a part of the Java Collection Framework and stores the data in the form of key-value pairs. HashMap internally contains an array of nodes, and the node is here represented as class. So, node data members are:

Node Class Data-members

Put() method in HashMap:

So, here, suppose I have three values as seen in the image.

Putting values in HashMap

So, if I call the put () method for the string “Tanishq,” it will first calculate a hash value for the string, assuming the hash is 2657680. Now, as we saw, HashMap is an array of nodes, so to store key and value pairs in HashMap we need to calculate the index. So, the formula to calculate the index is

Index=hashCode(Key) & (n-1)

Here n represents the size of the array. Generally, n is 16.

So, here the index is 2657680 & 15 = 4.

Now we store this node at index 4.

Fig.1

Assume we compute the hash for the string “Devanshu” and it is 63281940, and we compute the index as:

Index = 63281940 & 15=4

So, we have computed the same index as even String Tanishq is stored at index 4, so here we would be using the equals() method of HashMap and checking whether the key already stored at that index is equal to the key for which the index is calculated. If it turns out to be equal, then we replace the value with the current value, else we connect this node to the previous node using a linked list.

So, here is the image after putting the second string into HashMap.

Fig.2

Similarly, when we put String “Yash” in HashMap, a hash would be computed for e.g., assume it to be 2349873. Then we will calculate the index as

Index = 2349873 & 15 = 1

So, we will put the string “Yash” at index 1 as seen in the image.

Fig.3

Get() method in HashMap:

When we call the get() method in HashMap and pass the key, it will first calculate the hash. we discussed earlier in the blog, a unique string always produces the same hash. So here, after calculating the hash, we compute the index by the formula we have seen above.

So, here, if we write

Get() method in HashMap

So, it will calculate hash. It would come out to be 63281940. Then we will calculate index by index formula. The index would be:

Index = 63281940 & 14 = 4

After calculating the index, we will go to index 4 and check whether there is a node at index 4. Here it is. So, we’ll check the key of the first node at index 4, which is “Tanishq.” HashMap will use the equals() method to see if both keys are equal. In this case, the strings “Devanshu” and “Tanishq” are not equal, so it will check whether the next element exists, and if not, it will return null. But, since the next element exists, it will check whether the key to the next element is equal to the key passed in the method call, and now both strings match, so it returns the value of that element.

So, this is the internal working of HashMap, and as we saw here, Hashing plays a very important role.

3.) Database Management System:

When a database grows large, retrieving data becomes a time-consuming task, making it inefficient to search all of the indexes, which increases latency. We use hashing to solve this problem. Without using indexes, the hashing technique is used to directly calculate the address of a required record.

The hash function can employ any of the column values in this technique, but the primary key is usually used to generate the address of that particular block of data.
It consists of three key components:

  1. Data Bucket: This are the memory location where data is stored.
  2. Hash Function: Hash Function is a mapping function that helps us to map all the set of search keys to its actual address.
  3. Hash Index: The prefix of generated hash value is taken as Hash Index.

Types of Hashing in DBMS:

  1. Static Hashing: In Static hashing when a search key is provided, here it always the computes address.
  2. Dynamic Hashing: Dynamic hashing has an advantage over Static hashing, static hashing can expand or shrink dynamically. It is also called Extended Hashing

As a result, hashing is vital for dealing with large amounts of data and retrieving it quickly.

4.) Blockchain:

Blockchain

The foundation of cryptocurrency began with the emergence of blockchain technology. We all know that “Blockchain” is a buzzword, but how many of us know how it works?

So, let’s see this. In blockchain, we use SHA-256 (Secure Hashing Algorithm 256). It is one flavor of SHA2, founded in the year 2001 by the National Security Agency. For a given input, it gives an output hash of 256 bits (64 hex characters). The generated hash’s first four characters play a vital role in determining whether it is a valid block or not.

Mining in Blockchain

So, whenever a miner tries to make a new block, it is added to the bitcoin blockchain. A block consists of a block number, data, a generated hash value, the previous block’s hash value, and Nonce(Number used once). So, whenever we want to make a new block, we enter data and Nonce and then a hash is generated. If the leading four digits of the hash are 0, then the block is valid, else, it is invalid, and we need to make changes in Nonce. A nonce is basically a random number that is used to make the block valid. A miner usually starts with the value of Nonce and increments it by one up till he finds his block valid. So, as we are all thinking, it takes several iterations to make the block valid. Once the block is successfully mined, it is added to the last block in the chain.

Thus, these blocks are always chained together, and this makes them immutable. This blockchain is available globally.

Chaining in Blockchain

In a blockchain, every block carries the reference of the previous block in the form of its hash. So, suppose we have four blocks a, b, c and d. Here, as a is the first block, it won’t have a previous hash value. The first block of any blockchain, according to Satoshi Nakamoto, is a genesis block with a previous hash value of “0.” based on the usual components. To give an overview of the Genesis Block, it is the first block of any blockchain and it is also the “Block 0” which acts as the foundation for the further blockchain. But block b would have the signature of block a in the previous hash, block c would have the signature of block b in the previous hash, and so on.

So, if anybody tries to alter data in a particular block, mining will give us a different signature. For example, if we alter data in block b, we will get a new signature, and now this new signature is not chained with other blocks. So here it breaks block number 2 and invalidates every block that comes after it, all the way up to the last block. The consensus algorithm is used to check whether the blockchains displayed for each and every peer node (user) are the same or not. Hence, a small change in any one of the blockchains is reverted back to the original blockchain due to the robust functioning of the consensus algorithm. Thus, hashing plays a major role.

--

--