Reusable Hash Collisions

--

The basis of trust in data is typically focused on a cryptographic hash. With this, we can tell if two data elements are the same or not. But we only have a finite number of bits to represent the data, and so we may generate the same hash for different data. Luckily, it is quite difficult to find a collision for most hashing methods, and if we do, the actual data element structures are often different in their format. For example, we may have an image and a random data element and which produce the same hash. But as the random data element is not a valid image, we are not producing a collision for two images. For MD5, it is now not too difficult to produce a collision from two or even three images which are different — this is a full context collision. This is known as the Birthday Attack.

But, sometimes in a hashing method, there is a natural flaw, and where we can flip a few bits of a given input pattern and which can then produce the same hash output for different inputs. This is known as a natural collision.

If we can find a natural collision for a hash, such as H(a)=H(b), we can also create a hash for H(a || c) = H(b || c) and where “||” is a concatenation. An example of a collision in MD5 is where we take two input values of:

and the result is the same MD5 hash of 79054025255FB1A26E4BC422AEF54EB4.

A natural collision

A collision occurs when there are two different values that produce the same hash signature. In the following example, we use a hex string to define the data element (as the characters would be non-printing). For the following we have different values which create the same hash signature (the red characters identify the changes in the input data):

d131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f89 55ad340609f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5d8823e3156348f5bae6dacd436c919c6 dd53e2b487da03fd02396306d248cda0 e99f33420f577ee8ce54b67080a80d1e c69821bcb6a8839396f9652b6ff72a70

Try!, which should give a MD5 hash of: 79054025255FB1A26E4BC422AEF54EB4 Check

d131dd02c5e6eec4693d9a0698aff95c2fcab50712467eab4004583eb8fb7f89 55ad340609f4b30283e4888325f1415a085125e8f7cdc99fd91dbd7280373c5d8823e3156348f5bae6dacd436c919c6 dd53e23487da03fd02396306d248cda0 e99f33420f577ee8ce54b67080280d1e c69821bcb6a8839396f965ab6ff72a70

Try!, which should give a MD5 hash of: 79054025255FB1A26E4BC422AEF54EB4 Check

Now we can use:

0e306561559aa787d00bc6f70bbdfe3404cf03659e704f8534c00ffb659c4c8740cc942feb2da115a3f4155cbb8607497386656d7d1f34a42059d78f5a8dd1ef

and:

0e306561559aa787d00bc6f70bbdfe3404cf03659e744f8534c00ffb659c4c8740cc942feb2da115a3f415dcbb8607497386656d7d1f34a42059d78f5a8dd1ef

This gives [here]:

and [here]:

Notice how the SHA-1 has changes for the different data inputs, but the MD5 hash produces a collision.

If we now add “hello” to this data we get [here]:

b'0e306561559aa787d00bc6f70bbdfe3404cf03659e704f8534c00ffb659c4c8740cc942feb2da115a3f4155cbb8607497386656d7d1f34a42059d78f5a8dd1ef' 
Hex: cee9a457e790cf20d4bdaa6d69f01e41

b'0e306561559aa787d00bc6f70bbdfe3404cf03659e744f8534c00ffb659c4c8740cc942feb2da115a3f415dcbb8607497386656d7d1f34a42059d78f5a8dd1ef'
Hex: cee9a457e790cf20d4bdaa6d69f01e41

Adding: hello
b'0e306561559aa787d00bc6f70bbdfe3404cf03659e704f8534c00ffb659c4c8740cc942feb2da115a3f4155cbb8607497386656d7d1f34a42059d78f5a8dd1ef68656c6c6f'
Hex: 4d0c8baa8a036cff537f00d6e26bbef5
b'0e306561559aa787d00bc6f70bbdfe3404cf03659e744f8534c00ffb659c4c8740cc942feb2da115a3f415dcbb8607497386656d7d1f34a42059d78f5a8dd1ef68656c6c6f'
Hex: 4d0c8baa8a036cff537f00d6e26bbef5

We see that the original data gives the same MD5 hash value (cee9a457e790cf20d4bdaa6d69f01e41), and when we add the string of “hello”, we also get a collision of “4d0c8baa8a036cff537f00d6e26bbef5”.

Code

An outline of the code is here:

import hashlib
from binascii import unhexlify,hexlify
import sys



m = hashlib.md5()
m1 = '0e306561559aa787d00bc6f70bbdfe3404cf03659e704f8534c00ffb659c4c8740cc942feb2da115a3f4155cbb8607497386656d7d1f34a42059d78f5a8dd1ef'
m2 = '0e306561559aa787d00bc6f70bbdfe3404cf03659e744f8534c00ffb659c4c8740cc942feb2da115a3f415dcbb8607497386656d7d1f34a42059d78f5a8dd1ef'



# m1 = unhexlify

('4dc968ff0ee35c209572d4777b721587d36fa7b21bdc56b74a3dc0783e7b9518afbfa200a8284bf36e8e4b55b35f427593d849676da0d1555d8360fb5f07fea2' )

# m2 = '4dc968ff0ee35c209572d4777b721587d36fa7b21bdc56b74a3dc0783e7b9518afbfa202a8284bf36e8e4b55b35f427593d849676da0d1d55d8360fb5f07fea2'

#

m1='d131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f8955ad340609f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5bd8823e31563

48f5bae6dacd436c919c6dd53e2b487da03fd02396306d248cda0e99f33420f577ee8ce54b67080a80d1ec69821bcb6a8839396f9652b6ff72a70'

#

m2='d131dd02c5e6eec4693d9a0698aff95c2fcab50712467eab4004583eb8fb7f8955ad340609f4b30283e4888325f1415a085125e8f7cdc99fd91dbd7280373c5bd8823e31563

48f5bae6dacd436c919c6dd53e23487da03fd02396306d248cda0e99f33420f577ee8ce54b67080280d1ec69821bcb6a8839396f965ab6ff72a70'

# d131dd02c5e6eec4693d9a0698aff95c2fcab58-12467eab4004583eb8fb7f8955ad340609f4b30283e4888325-

1415a085125e8f7cdc99fd91dbdX280373c5bd8823e3156348f5bae6dacd436c919c6dd53e2-

487da03fd02396306d248cda0e99f33420f577ee8ce54b67080a80d1ec69821bcb6a8839396f965-b6ff72a70

diffs = [i for i in range(len(m1)) if m1[i] != m2[i]]
print ("Changes in hex at positions:",diffs)
m_diff = m1
for x in diffs:
m_diff=m_diff[:x] + '[-X-]' + m_diff[x + 1:]
print (m_diff)


m1=unhexlify(m1)
m2=unhexlify(m2)

word = "hello"
if (len(sys.argv)>1):
word=str(sys.argv[1])

a =word.encode()

mm1 = hashlib.md5()
mm2 = hashlib.md5()
print (hexlify(m1),"\n Hex:",mm1.digest().hex())
print ("\n",hexlify(m2),"\n Hex:",mm2.digest().hex())
print ("\nNow adding: ",word)
mm1.update(m1+a)
mm2.update(m2+a)
print (hexlify(m1+a),"\n Hex:",mm1.digest().hex())
print ("\n",hexlify(m2+a),"\n Hex:",mm2.digest().hex())

A sample run [here]

Changes in hex at positions: [43, 86]
0e306561559aa787d00bc6f70bbdfe3404cf03659e7[-X-]4f8534c00ffb659c4c8740cc942feb2da115a3[-X-]4155cbb8607497386656d7d1f34a42059d78f5a8dd1ef
b'0e306561559aa787d00bc6f70bbdfe3404cf03659e704f8534c00ffb659c4c8740cc942feb2da115a3f4155cbb8607497386656d7d1f34a42059d78f5a8dd1ef'
Hex: d41d8cd98f00b204e9800998ecf8427e
b'0e306561559aa787d00bc6f70bbdfe3404cf03659e744f8534c00ffb659c4c8740cc942feb2da115a3f415dcbb8607497386656d7d1f34a42059d78f5a8dd1ef'
Hex: d41d8cd98f00b204e9800998ecf8427e
Now adding: hello
b'0e306561559aa787d00bc6f70bbdfe3404cf03659e704f8534c00ffb659c4c8740cc942feb2da115a3f4155cbb8607497386656d7d1f34a42059d78f5a8dd1ef68656c6c6f'
Hex: 4d0c8baa8a036cff537f00d6e26bbef5
b'0e306561559aa787d00bc6f70bbdfe3404cf03659e744f8534c00ffb659c4c8740cc942feb2da115a3f415dcbb8607497386656d7d1f34a42059d78f5a8dd1ef68656c6c6f'
Hex: 4d0c8baa8a036cff537f00d6e26bbef5

--

--

Prof Bill Buchanan OBE FRSE
ASecuritySite: When Bob Met Alice

Professor of Cryptography. Serial innovator. Believer in fairness, justice & freedom. Based in Edinburgh. Old World Breaker. New World Creator. Building trust.