Why We Hash

Sean Coates
4 min readNov 12, 2019

--

My earliest exposure to hashes was my grandmother’s address card box. This was a small plastic bin that held 3x5 cards. On each card was information about a friend, relative, or business. It contained phone numbers, addresses, birthdays, anniversaries, and other useful notes. Today, you would store this information in your Contacts app on your phone; back then, phones were operated with a rotary dial and had no computer inside.

Anyway, there were a lot of cards in this bin. It was easy to find a card because there were also index cards, with tabs, separating the bin into sections based on the first letter of the person’s last name. Apparently, many names began with “Mc” (McMillan, McAllister, etc.) as there was a section just for those.

Near the end of the year, she’d go through the bin and pull out all the cards for the people whom she would be sending a Christmas Card. (Back then, you mailed physical cards to people’s homes, as no one had email.) That done, she’d give the cards back to me and tell me to put them back in the bin. Because of the index cards with tabs, this was very easy to do.

Unbeknownst to me at the time, I was using a hashing function. I would take the last name on the card, truncate it down to the first letter of the last name (treating Mc as a letter) and stick it in the appropriate section in the bin. It wasn’t the most balanced of hashes. The “XYZ” section had very few cards, while “G” and “S” and “T” had lots of cards.

That’s how hashing works. It takes an input string of undetermined length and returns a much shorter data point of limited size that is easier to work with.

Performance is not the only reason we hash. Hashes are also used for data integrity. Back when I was using floppy disk drives, I learned about cyclic redundancy checks. The simplest form of these was the parity bit, a CRC with one bit of data. For each byte of data stored, a parity bit was added, making it 9 bits stored on disk for every 8 bits of data. The bit was 1 or 0 depending on if there was an odd or even number of 1’s in the byte. If you read the disk later and the parity bit didn’t match the data, you knew that one of the bits had been corrupted. It is possible to have two bits corrupted and result in parity matching, so this method was far from perfect at detecting errors. Still, if bits were getting corrupted in one byte, it was likely happening in other bytes, and you would likely soon learn your floppy was turning into garbage.

In this case, the input to the hash function was one byte, 8 bits long. The output was one bit. For example, h(10101010) = 0, h(00000111) = 1.

Over time, computer scientists and engineers created more complex hashes. Soon, CRC checks were 32 bits, and then later replaced by much longer hashes. These hashes were designed in such a way that the chance of flipping two bits of data in the input string and still having a matching hash becomes infinitesimally small.

As the hashes improved, they were used not only to confirm that the data was not corrupted, but it also proved that some evil third party did not alter the data. (Of course, some hackers were trying to alter messages, and this led to a need for these improved security measures.)

Hashes also provide some security when it comes to hiding the content of the data being hashed. In theory, when the output of the hash function is smaller than the input string, it should be very hard or impossible to determine what the input string was if you only have the hash value. In this respect, it’s a form of one-way encryption and is used with passwords. The problem with passwords is that people tend to use short, popular strings. A hacker can hash all of the popular and short strings, compare the results to a victim’s hashed password, and have an excellent chance at guessing what the victim’s actual password is.

Early hashing functions were weak and resulted in security weaknesses. Today, hashing functions are much more sophisticated. Cryptocurrency transactions use hashing functions as a critical part of their security.

--

--

Sean Coates

Occasional posts from @fooyay about software engineering, crypto, finance, and more. https://fooyay.com for current content.