Cryptography 102: Hash me Outside how Bout Dat

Published in

FullStackHacks

6 min readApr 24, 2017

Last week I wrote about PGP keys and how they can be used to sign, encrypt, and decrypt data. This week I’m going to go over more usages of encryption. If you don’t have a good grasp on the basics of public/private key encryption or you haven’t read last week’s article I’d recommend going back and checking that out here.

One major topic in encryption is secure hashing algorithms. For those who don’t know “a hash function is any function that can be used to map data of arbitrary size to data of fixed size” (Wikipedia). That means converting some piece of data into a number that I can use to identify it. An example would be using a serial code to map a product to a number. If I’m selling a product I can create a unique code that could be used identify the product. That serial code in this situation is a “hash code.”

A common of use hash codes in computer science are in hash tables. These tables work by storing data in a reference lookup data-structure, such as an array, by using the data’s hash code as the key. For example say I need to count the number of particular characters in a file. I could hash each character using its numerical ASCII representation. Then I would initialize an array with 256 entries. Now I can easily read/update each characters count in O(1) time. This great for quick look ups, but some hash functions have huge cryptographic benefits.

One of the most commonly used cryptographic hashing functions is the SHA-256 algorithm. It’s a member of the SHA-2 family of algorithms. There’s also a SHA-1 family and a SHA-3 family of algorithms. SHA stands for secure hashing algorithm. SHA-1 was compromised by Google and is no longer recommend for use with cryptography. SHA-2 was developed by the NSA and was first published in 2001. SHA-3 was developed as an alternative to SHA-2 and the modern version was first released in 2012.

The “256” in SHA-256 means that the hash code is 256 bits long. That equates to 2²⁵⁶ or ~1.1579209*10⁷⁷ possible values. To put the size of that number in perspective that are approximately 1.33*10⁵⁰ atoms on earth and only 10⁷⁸ to 10⁸² atoms in the known universe. So it is safe to say that 1.1579209*10⁷⁷ is a large number. The best way I can describe the algorithm is that it works by going through all the data in the message or file being hashed and it scrambles it into a single 256 bit value. Under the hood, it works by passing the data through many rounds of bit shifting and other boolean bit operators such as XOR and AND. This way the algorithm is able to quickly randomize the data quickly into a value; in away that can’t be replicated without the original value. If you want to read more about the specifics here’s a good write up on it. The algorithm is considered a one-way function. This means, it is easy to hash, but very difficult to “de-hash.” Therefore, exposing the hash code will not expose anything about the underlying data.

With these secure hash algorithms users can securely pass hash codes between each other to verify data. Secure hash code allow users to verify if two things are equal. If two files produce the same hash codes then you know they are identical. One clever use of this is with passwords. Passwords in a database will often be stored as their hash. This way a user can send their password to be hashed, verified, and then forgotten. If a hacker gains access to the database all they’ll see are hash codes that reveal nothing about the original password. This is also one of the reasons that users are required to change their password when they’re forgotten. If a service is doing their job right they won’t be able to tell you what your password is even if the wanted to.

Another big use of secure hashing is for data integrity. When downloading a file there’s no way to verify that it wasn’t altered somewhere along the way. If a server was compromised a hacker could secretly share their own version of the software. That software could then act as a Trojan horse and infect other computers with its own malware. In order to combat this developers will often sign messages with the SHA-256 hash code. Users can then verify the hash codes with the developers public keys and check them against the hash codes generated by the downloaded software. This gives users a way to verify the software they’re attempting to run; making sure it matches the software they’re intending to run. SHA-256 hash codes are often used in digital signatures as well. Instead of signing the whole message users can sign just the hash code of the message. So, as you see secure hashing has a lot of possible use cases.

One of the biggest uses of encryption on the web is TLS/SSL encryption. TLS (transport layer security) is the successor to SSL (secure sockets layer). Whenever you’re browsing the web and you go to a site with “https” in your address bar: that means you’re using TLS/SSL encryption. In order for a TLS connection to be established there are a series of 9 steps that make up the what’s called the TLS Handshake Protocol. These steps are:

The client will send a “hello message” to the server they’re attempting to securely communicate with.
The server politely responds with a “hello message” of its own that contains a randomly generate number.
Next a certificate is sent from the server to the client and a request can be made at this time for a certificate from the client.
If the request was made in the previous step the client with send the requested certificate.
At this point the client creates a secret value and then encrypts it using the public key in the server’s certificate.
The server and the client now generate the master secret and session keys from the secret value in the previous step.
The client now sends a “change cipher spec message” letting the server know to start using the session keys for hashing and encrypting messages.
The server switches over the symmetric encryption with the session keys and sends the “server finished message.”
The client and server can now securely exchange data using the session key.

Despite that being a lot to take in, it’s not as complicated as it is seems at first glance. What is happening is both parties are exchanging public keys. This allows for the two parties to securely exchange information, but public/private key encryption is what’s called asymmetric encryption. Asymmetric encryption is amazing because it allows information to be exchanged without agreeing on a single private key that could be compromised. However, it’s much slower than many symmetric encryption algorithms that use a single key for all communication. When you’re a web server handle thousands taking on the cost of asymmetric encryption could cripple your service. The solution to this is to use standard asymmetric encryption only for the purpose of establishing a shared secret key (the session key). If two parties can agree on a secret session key than more efficient algorithms can be used and the web server won’t as bogged down with computation (here’s the list of accepted TLS algorithms).

Another important aspect of the server’s certificate is that it’s signed by a certificate authority. A certificate authority is a trusted central source that signs certificates to verify they belong to who the web service that they say they do. Remember the key signing parties I mentioned last week? Just thinking of the certificate authority as a giant key signing party of one. What the certificate authority does is it prevents a man in the middle attack. Without it, a malicious party could send their own public key in replace of the server’s then intercept all your data. If they were smart they would then re-encrypt your data and send it to the web service with no way for anyone to be aware of what’s going on.

And… that’s cryptography. Well obviously there’s a lot more to it, but that’s the basics of it all. If I missed anything/said something incorrect let me know in the comment section below.

Interesting link

Cryptography 102: Hash me Outside how Bout Dat

Written by Adam Collins