BIRTHDAY ATTACK

Hemang Ahuja
CYSCOM VITCC
Published in
6 min readAug 16, 2021

A Birthday Attack is a cryptographic attack which exploits the possibility of hash collisions to hijack secure connections.

To understand Birthday Attack and why it is called so, let us understand what a hash (function) and the associated hash collision is.

HASH FUNCTION

A hash function is any function that takes variable sized data as its argument and returns fixed sized data. They are usually used to provide a unique index to an object.

For example :-

int hashFunction(int a) {return a%1069;}

This function returns modulo of a passed integer, which is of a fixed range (0–1068) that may uniquely identify any integer equal or smaller or larger than 1069.

hashFunction(420)=420;

hashFunction(1069)=0;

hashFunction(4200)=993;

So far so good, but something that may arise in the mind of a reader is that, a%1069 will return 0 for all multiples of 1069. This is true. So, our hash function does not map unique indexes to identify our passed parameter as we thought. This anomaly of a hash function is known as a hash collision.

Hashes are critical in cryptography where it is crucial to hide actual information while still pertaining uniqueness to each block of information.

Note that our hash function returns y for all a=1069k+y where k is any integer.

Therefore, our hash function is bad as it does not have the uniqueness attribute.

Well, then how do we compress our data to a fixed size data while still maintaining uniqueness?

It’s a short answer — We Can’t!

Why? The Pigeonhole principle.

The pigeonhole principle states that if n items are put into m containers, with n>m, then at least one container must contain more than one item. Thus, we can’t compress and maintain uniqueness at the same time.

Although we can make it extremely extremely difficult to find collisions in our hash function, it is virtually impossible to avoid them.

Cryptographic hash functions such as SHA(Secure Hash Algorithm) and previously prominent MD(Message-Digest algorithm) are good examples of collision-resistant hash functions. The reason that collisions happen commonly in seemingly good hash algorithms is because of the birthday paradox.

THE BIRTHDAY PARADOX

Let us assume the probability of a person being born on a given date is a constant.

Suppose, you are in a class of 23 students. What do you think is the probability of you sharing a birthday with someone in the same class?

The number is actually higher than one would assume. It exceeds 50%!

It reaches 99.9% at number of students = 70.

Just like how you only need 70 people with a sample space of 365 unique birthdays(!) for a clash to exist, you only need a small number of hashes even with 1069 possible hashes for a clash to occur.

where H is the sample space of the hash function and N is the sample size of the inputs (in bits).

This roughly estimates to 50%. Thus, you will have a hash collision at every N/2 hashes.

For example:- MD5 will suffer from hash collision after 2⁶⁴ hashes (because its sample space is 2¹²⁸).

This can be exploited and such an attack is called Birthday Attack and is easier than a brute force attack.

BIRTHDAY ATTACK

We can exploit the surprisingly high probability of a hash collision in the following way.

Suppose you want to deceive your friend Abhiram by making him sign a different contract than one he is expecting.

Digital signatures work in a way that party A generates a hash of the document and encrypts it with their private key to create a digital signature and give it to party B along with the original data and information about the hashing algorithm.

Party B then creates another hash of the document and also uses A’s public key to decrypt the digital signature and compares the two hashes. If they are the same, then the document has not been tampered with since it was last signed. This proves the integrity of the document.

So, to deceive Abhiram, you will create variations of the document that he is expecting to sign and variations of the fraudulent document that you want him to sign.

The point is to find a pair of the good document and a bad document such that hash(good_document)=hash(bad_document).

Now you can give the good document to Abhiram and he will hash and encrypt the hash with his private key and give you the generated digital signature. But, since the hash of this good document is the same as the hash of the bad document, you can claim that the digital signature is indeed valid for the bad document and that the bad document is legal.

This all is possible due to the fatal anomaly of a hashing algorithm known as hash collision.

And hash collision has been possible due to the birthday paradox. Thus this attack is known as the Birthday Attack.

TRYING OUR OWN HASH COLLISION EXPLOIT

Download evelizie library. Click to download

  • Unpack the archive and build the library and tools:
tar zxf evilize-0.2.tar.gzcd evilize-0.2make

This creates the programs “evilize”, “md5coll”, and the object file “goodevil.o”.

  • Create a C program with multiple behaviors. Instead of the usual top-level function main(), write two separate top-level functions main_good() and main_evil().

See the file hello-erase.c for a simple example.

  • Compile your program and link against goodevil.o. For example:
gcc hello-erase.c goodevil.o -o hello-erase
  • Run the following command
./evilize hello-erase -g good -e evil

Check the MD5 checksums of the files “good” and “evil”; they should be the same.

Run the programs “good” and “evil” — they should exhibit the two different behaviors that you programmed in step 2.

Now you can deceive anyone who checks integrity using MD5 digital signature.

CONCLUSION — THE DEATH OF MD5 AND SHA1

MD5 and SHA-1 are two of the most popular and widely used hash functions. They however are subject to differential cryptanalysis-based collision attacks. MD5 is utterly broken, as collisions can now be identified on current machines in a matter of minutes. SHA-1 is showing signs of weakness, albeit it is not fully broken. In other words, SHA-1 attacks have a lower time complexity than brute force attacks.

The size of the hash value in MD5(128 bits) is small enough to contemplate a birthday attack. Collisions for MD5 were announced by Xiaoyun Wang and others in 2005.Their analytical attack was reported to take only one hour.

In February 2017, CWI Amsterdam and Google announced they had performed a collision attack against SHA-1, publishing two dissimilar PDF files which produced the same SHA-1 hash.

Thanks for reading

Connect with me on Linkedin

--

--