Cryptography with Python — Hashing

Ashiq KS
8 min readJan 19, 2019

--

A Pythonic implementation of hash functions, message authentication codes and key derivation functions.

This article shows the Pythonic implementation of the cryptographic Hash functions, Message Authentication Codes, and Key Derivation Functions. This article itself is not an introduction to cryptography or the algorithms of cryptography or the methods. This is an introduction to the Python implementation of cryptographic algorithms and methods.

In this article, we will be implementing various Hash functions like SHA1, SHA3, and BLAKE2, Message Authentic Code like HMAC and functions that generate secret keys from passwords-Key Derivation Functions like Scrypt and Argon2 in Python.

Hash Functions

The Python module ‘hashlib’ provides a simple to use interface for the hash function in cryptography. We will analyze some in here. First, we see an example for ‘sha3–512’ hash function from the SHA3 family.

import hashlib
from binascii import hexlify
data = 'Sending encrypted'
data = data.encode('utf-8')
sha3_512 = hashlib.sha3_512(data)
sha3_512_digest = sha3_512.digest()
sha3_512_hex_digest = sha3_512.hexdigest()
print('Printing digest output')
print(sha3_512_digest)
print('Printing hexadecimal output')
print(sha3_512_hex_digest)
print('Printing binary hexadecimal output')
print(hexlify(sha3_512_digest))
Output:Printing digest output
b"\xe3\xfcs\x1b\xc3x\xbe\x81w\xdf9 \xea>@\r\r\xd6\xd0\xb0\xc1=9\xbc`a\xafz\xa2\n\xc1{d\x03\x85\x05\xaa':S&1\xe0\xd7\x91\xbc*\x9f;)\xff?\xa2C\xa3A,\x99\\\xbfM\x8a\xa7Q"
Printing hexadecimal output
e3fc731bc378be8177df3920ea3e400d0dd6d0b0c13d39bc6061af7aa20ac17b64038505aa273a532631e0d791bc2a9f3b29ff3fa243a3412c995cbf4d8aa751
Printing binary hexadecimal output
b'e3fc731bc378be8177df3920ea3e400d0dd6d0b0c13d39bc6061af7aa20ac17b64038505aa273a532631e0d791bc2a9f3b29ff3fa243a3412c995cbf4d8aa751'

First of all, we import the necessary modules — ‘hashlib’ and ‘binascii’. ‘hashlib’ contains the hash functions and ‘binascii’ is a module for binary-to-ascii and ascii-to-binary conversions.

Here data is the message that we want to be converted to a ‘digest’, output of a hash function. Since the hash functions in Python take the data in bytes we have to encode it into bytes using the ‘encode()’ function of the ‘String’ class and it takes the default argument ‘utf-8’. It means we are encoding it into 8-bit Unicode format. Another way to convert it into bytes while creating the string variable.

data = b'Sending encrypted'
print(type(data))
#Prints '<class 'bytes'>'

Next, we instantiate a ‘sha3_512’ class from the ‘hashlib’ module and it takes in one argument— the data to be hashed in bytes.

sha3_512 = hashlib.sha3_512(data)

Here, we assigned the hash object a variable named ‘sha3_512’.

Another way of inputting data into the hash object is using the ‘update’ method.

data = b'Message encrypted'
sha3_512 = hashlib.sha3_512()
sha3_512.update(data)

Or, one more way is to split the bytes and add it to separate update calls as follows.

sha3_512 = hashlib.sha3_512()
sha3_512.update(b'Message')
sha3_512.update(b' encrypted')

We get the output, called the ‘digest’, of the hash function, is by applying the ‘digest()’ method on the hash object.

print(sha3_512.digest())
# b"\xe3\xfcs\x1b\xc3x\xbe\x81w\xdf9 \xea>@\r\r\xd6\xd0\xb0\xc1=9\xbc`a\xafz\xa2\n\xc1{d\x03\x85\x05\xaa':S&1\xe0\xd7\x91\xbc*\x9f;)\xff?\xa2C\xa3A,\x99\\\xbfM\x8a\xa7Q"

The result is in bytes. If we want the digest in hexadecimal we need to use the ‘hexdigest()’ method.

print(sha3_512.hexdigest())
#e3fc731bc378be8177df3920ea3e400d0dd6d0b0c13d39bc6061af7aa20ac17b64038505aa273a532631e0d791bc2a9f3b29ff3fa243a3412c995cbf4d8aa751

Note that the output of ‘hexdigest’ is not bytes, it is a string type. If we want ‘hexdigest’ in bytes we can apply a method called ‘hexlify’ of ‘binascii’ module to convert the digest output into hexadecimal bytes.

print(hexlify(sha3_512.digest()))
#b'e3fc731bc378be8177df3920ea3e400d0dd6d0b0c13d39bc6061af7aa20ac17b64038505aa273a532631e0d791bc2a9f3b29ff3fa243a3412c995cbf4d8aa751'

As we know the digest size of the sha3_512 hash function is 512 bits or 64 bytes. We can get the size information by the ‘digest_size’ attribute as follows.

print(sha3_512.digest_size)
#64

If we want to compare two digests or two hexadecimal digests there is a method in the ‘secrets’ module of Python, named ‘compare_digest()’.

We will see an example of ‘compare_digest()’ in the SHA1 family hash function.

from hashlib import sha256
from secrets import compare_digest
sha256_digest_1 = sha256(b'sha256 hashed message')
digest_1 = sha256_digest_1.digest()
hexdigest_1 = sha256_digest_1.hexdigest()
sha256_digest_2 = sha256()
sha256_digest_2.update(b'sha256')
sha256_digest_2.update(b' hashed')
sha256_digest_2.update(b' message')
digest_2 = sha256_digest_2.digest()
hexdigest_2 = sha256_digest_2.hexdigest()
print(compare_digest(digest_1, digest_2))# Prints True
print(compare_digest(hexdigest_1, hexdigest_2)) # Prints True

First, we instantiated two sha256 objects and added the data and computed both the digest and hexdigest. Using the ‘compare_digest’ method of ‘secrets’ module we can compare if both digests or hexdigests are equal or not. In these cases, both are equal and output ‘True’. It is advised not to use SHA1 family hash functions any more practically as they have many vulnerabilities.

Another popular hash function BLAKE2 family. We will see an example of ‘blake2b’ which is optimized for 64-bit operating systems and outputs varying length hash functions up to 64 bytes.

from hashlib import blake2bdata = b'Message for transmission'blake = (data, digest_size=32)
print('Printing blake digest')
print(blake.digest())
print('Printing blake hexadecimal digest')
print(blake.hexdigest())
Output:Printing blake digest
b'u\x9a7&\xb6b\x01\xea\xeds\xdf[\xa9\xbb\xedY\xaa\xcc\xa2\xb57\xd73b\xe4\xbeF\x82\x11\xe8\xb7&'
Printing blake hexadecimal digest
759a3726b66201eaed73df5ba9bbed59aacca2b537d73362e4be468211e8b726

The ‘digest_size’ argument in the constructor determines the size of the resulting digest, default size is 64 bytes. There are many other arguments, we will discuss one of them in the next section.

Message Authentication Codes

Message Authentication Code (MAC) behaves like a hash function with a key. It is also known as keyed hash functions. Some of MACs are HMAC (Hash-based MAC), CMAC (Cipher-based MAC), Poly1305. Here we will be discussing the HMAC, Poly1305, and BLAKE2 as a substitute for HMAC. Let’s see an example of HMAC.

Hashed Message Authentication Code (HMAC)

import hmac, hashlibdata = b'Message for HMAC'
key = b'keyed-version'
hmac_code = hmac.new(key=key, msg=data, digestmod=hashlib.sha3_256)
hmac_digest = hmac_code.digest()
hmac_hexdigest = hmac_code.hexdigest()
print('HMAC digest: ', hmac_digest)
print('HMAC hexdigest: ', hmac_hexdigest)
Output:HMAC digest: b'\x98\x82}\xb5\xb7U\xe8Nj;&\xcf\xa9\t\xa4\xb61{\xd7\xb5\xf2\xe6\x89\xc7\xfdA\x15Q\x89\x11\xb3\x94'HMAC hexdigest: 98827db5b755e84e6a3b26cfa909a4b6317bd7b5f2e689c7fd4115518911b394

We imported the ‘hmac’ and ‘hashlib’ modules and declared our data and the key that we intend to use. The ‘new’ constructor takes in three arguments, ‘key’, ‘msg’ — the data and ‘digestmod’ — the particular hash function we use. Both the ‘key’ & ‘msg’ arguments should be bytes or byte array objects. In here, we use the sha3_256 hash function. As the hashlib objects have ‘digest()’ and ‘hexdigest()’ methods, ‘hmac’ also has the same functions with the same purpose. Moreover, hmac has its own digest comparing method, ‘compare_digest’ which is as same as that of the ‘secrets’ module.

Poly1305

It is a faster MAC calculating algorithm. It requires a 32-byte secret key, nonce ( a random value ), a symmetric cipher (AES or ChaCha20, more on these on another article). We need to have a module named ‘PyCrptodome’ and let is installed as pip3 install pycryptodome and the imported library in the shell is ‘Crypto’.

from Crypto.Hash import Poly1305
from Crypto.Cipher import AES
key = b'The key size has to be 32 bytes!'
mac = Poly1305.new(key=key, cipher=AES)
mac.update(b'message to be delivered')
mac_nonce = mac.nonce
mac_hex_digest = mac.hexdigest()
print('Poly1305 nonce: ', mac_nonce)
print('Poly1305 hex_digest: ', mac_digest)
mac_verify = Poly1305.new(key=key, nonce=mac_nonce, cipher=AES,
data=b'message to be delivered')
try:
mac_verify.hexverify(mac_hex_digest)
print('The message is authentic')
except:
print('The message cannot be authenticated')
Output: Poly1305 nonce: b'2\xbf\xe8<\x94\xbe\x8a\x8eb3\x9d2\xb6\xe8\x13\xd6'Poly1305 hex_digest: ce9224d3edc6445d7d8a251447b2a1c0The message is authentic

The Poly1305 resides in the ‘Crypt.Hash’ module and we are taking the AES cipher from Crypto.Cipher to use with Poly1305. We can also use ChaCha20 cipher. After importing the necessary libraries, we initialize the key which has to be 32 bytes long.

Now we have to create an object of the ‘new’ class of Poly1305 with three arguments, the ‘key’, the kind of ‘cipher’ and ‘nonce’.

mac = Poly1305.new(key=key, cipher=AES)

We can later insert the data or message using the ‘update’ method as we have seen in the case of hash functions.

‘Nonce’ is a random value initializable value, 16-bytes in the case of AES and 8 or 12 in the case of ChaCha20. If we don’t specify ‘nonce’ value then it itself initializes some random value. But to verify we need to keep the ‘nonce’ later to verify the ‘MAC’ to be used in another Poly1305 object. We can get the ‘nonce’ variable with the ‘Poly1305.new().nonce’ attribute.

To get the hexdigest of the MAC, we can use the ‘hexdigest()’ method on the Poly1305 object.

mac_hex_digest = mac.hexdigest()

print('Poly1305 hex_digest: ', mac_digest)
#prints 'Poly1305 hex_digest: ce9224d3edc6445d7d8a251447b2a1c0'

Let’s say a sender has sent the message along with the MAC, nonce (here it is symmetric cryptography, so the receiver should also have the same key) and the receiver generates a new MAC from the received message and compares with the received MAC to make sure that the message has not been altered or tampered with.

If we want to verify the received MAC with the MAC generated from the received message, we have to create a new object of Poly1305 with the key and nonce (the same nonce used by the sender to generate the MAC) values received along with the MAC.

mac_verify = Poly1305.new(key=key, nonce=mac_nonce, cipher=AES,                                                                                         
data=b'message to be delivered')
try:
mac_verify.hexverify(mac_hex_digest)
print('The message is authentic')
except:
print('The message cannot be authenticated')

If the message is the same then we can verify that the message has not been altered or tampered with during the transmission.

BLAKE2

Let’s the purpose of BLAKE2 in the keyed-hashing.

from hashlib import blake2bblake = blake2b(key=b'keys', digest_size=60)
blake.update(b'Blake message')
blake_digest = blake.digest()
blake_hexdigest = blake.hexdigest()
print('Blake digest: ', blake_digest)
print('Blake hexdigest: ', blake_hexdigest)
Output:Blake digest: b'\xd6\xb5\xf6t\x94\xbcd\x92\xd4g\xccQ\xb0\xd8p\xe1\x80\xcb\xff\xa7\xd4\r@\xea\xcf\xe4\xc3\x1d\xc6\xb4TT\x19\x0e\x14\x1d\xd8\x80\xc6:\x11n\x8c\xf6l\x19\x96_\xa3\x8ae?^;\x17\x94f=2\xaf'Blake hexdigest: d6b5f67494bc6492d467cc51b0d870e180cbffa7d40d40eacfe4c31dc6b45454190e141dd880c63a116e8cf66c19965fa38a653f5e3b1794663d32af

What makes this ‘blake2b’ different from the ‘blake2b’ in the last section is that we used the ‘key’ argument and the digest size is 60 bytes.

Key Derivation Functions (KDFs)

KDFs are the function to securely derive keys from passwords. Keys are needed for many cryptographic algorithms like MACs, BLAKE2, symmetric and asymmetric encryption and decryption, and digital signing and even more. These functions derive secure keys from our passwords which aren’t easily interpretable. Some of the KDFs are Scrypt, Bycrypt, Argon2 etc. Here we will demonstrate Scrypt and Argon2. KDFs are highly resilient to brute force attack, rainbow attack, and dictionary attack by the usage of ‘salt’ (random number) and ‘iteration’ (no. of iterations to produce the final key) and many other arguments.

Scrypt

First of all, install Scrypt library using ‘pip’. Following is the parameter of Scrypt.

  1. N — iterations count, usually 16384 or 2048.
  2. r — block size, eg. 8.
  3. p — parallelism factor (threads to run in parallel), usually 1.
  4. password — the input password
  5. salt — securely generated random bytes
  6. buflen — the length of the output key in bytes.
import scrypt, secretspassword = b'not a number'
salt = secrets.token_bytes(32)
scrypt_key = scrypt.hash(password, salt, N=16384, r=8, p=1, 32)
print('Salt: ', salt)
print('Key: ', scrypt_key)
Output:Salt: b'\xdbS\x1e\xa2\x81e\xd3\x948p\xc3lmk\xd6\x8b\xb94\x1c\xd5A/\xa5gZ\xb1\xc2\x15\x99\x9d\xc8\xb8'Key: b'J\xc1\xc1"\xfd\x05\xfb\x14J\x96\xea\xe3\x1d\xa6\xbb\x01\xf7sj\x87\xf9\x18%\x00YK\x1f\xe8\xc8\x8d\xff%'

Here, the ‘scrypt.hash()’ method returns the key in bytes. To generate we used the ‘token_bytes’ function from the ‘secrets’ module, which is highly recommended for cryptographic applications. It takes a byte size argument in.

Argon2

In this section, we use the ‘argon2cffi’ module and we need it to be pip installed. There are three variants of Argon2,

  1. Argon2d — Provides strong GPU attacks, but has potential side-channel attacks.
  2. Argon2i — Provides less resistance to GPU attacks, but has no side-channel attacks.
  3. Argon2id — Combination of both Argon2i and Argon2d, highly recommended and the default one.

The ‘PasswordHasher’ class is used for deriving the keys from the passwords.

Parameters of Argon2 ‘PasswordHasher’ class

  1. time_cost — the number of iterations.
  2. memory_cost — defines the memory usage, given in kibibytes.
  3. parallelism — the number of parallel threads.
  4. hash_len — the length of the hash in bytes.
  5. salt_len — the length of the random salt to be generated for each password.
  6. encoding — the type of encoding for the arguments passed to the methods, the default is ‘utf-8’.
  7. type — the variants to Argon2 to be used, Argon2id is the default. Represented by ‘Type.<x>’, where ‘x’ can be ‘I’ for Argon2i, ‘D’ for Argon2d or ‘ID’ for Argon2id. ‘Type’ is an enum class of Argon2.

Let’s see it in action.

import argon2password = b'not a number'argon = argon2.PasswordHasher(time_cost=2000, memory_cost=102400,
parallelism=8, hash_len=16,encoding='utf-8',
type=argon2.Type.D)
key = argon.hash(password='not a number')
print(key)
Output:$argon2d$v=19$m=100,t=20,p=1$Ts3/mXf72z4HQ2/4fpyhZw$eM1reOAaSwMbyy1Y20SLJA

After instantiating an argon2 object we get the key by applying the method ‘argon.hash’ with an argument ‘password’ which can be either a byte or Unicode string. The output format is quite different from others in the way that it stores the config parameters along with the key. The key is the string following the last ‘$’ sign.

The ‘PasswordHaser’ class has a useful function ‘verify’ to check the password entered is matching or not to the registered password in the argon2.PasswordHasher’ object. It takes in two arguments, the hash, and the password to be checked. We have to pass in the whole output of the hash function including the config parameters to compare. It is as follows.

argon.verify('$argon2d$v=19$m=100,t=20,p=1$Ts3/mXf72z4HQ2/4fpyhZw$eM1reOAaSwMbyy1Y20SLJA', 'not a number')#Returns Trueargon.verify('$argon2d$v=19$m=100,t=20,p=1$Ts3/mXf72z4HQ2/4fpyhZw$eM1reOAaSwMbyy1Y20SLJA', 'not a password')#Raises a VerifyMismatchError

That is all for this article. Please comment below your suggestion and thoughts.

--

--