Using hashcat to recover hashed emails

Matthew Bajorek
8 min readJun 13, 2020

--

Foreword

This article is purely for legitimate recovery of one’s own consented collected emails. Please be aware of local laws for decrypting as this would be illegal to obtain information for nefarious purposes.

Background

You are excited that you are about to release a new app. The registration has been opened to a select few, but the rest of the people are adding their emails to a waitlist to be notified. Being a good developer you ensure the storage of these emails is secure by encrypting them.

After collecting emails for some time, you realized you did not encrypt them but you had hashed them by accident. They are definitely secure, but even secure away from you. Now you are unable to notify all of your potential clients. Is there any way you can recover the emails?

Recovery

There are two main ways the average developer can decrypt the hashed emails:

  1. Dictionary of emails
  2. Brute force with masking

The first way is great if you have a large collection of emails and you believe that there may be some potential clients emails in your list. However, this is rare and collecting that many emails is difficult and probably illegal.

The second way is better for the average developer and the only thing preventing you from uncovering all of the emails is time.

Technical Details On Emails

Emails, also called addr-spec, contain two parts separated by an @ sign:

addr-spec = local-part@domain

The local-part can contain a maximum of 64 characters of:

  1. Uppercase and lowercase Latin letters: A-Z and a-z
  2. Digits: 0-9
  3. Special characters: !#$%&'*+-/=?^_`{|}~
  4. A dot . as long as it is not the first or last character or appear consecutively unless quoted.
  5. Space and special characters: "(),:;<>@[\] are allowed inside a quoted string and a backslash or double-quote must be preceded by a backslash.
  6. There are proposed additions to add international characters, however if you are building an app for English speakers this would be less of a concern.

The domain can contain:

  1. Case insensitive letters: A-Z and a-z
  2. Digits: 0-9
  3. Special character: -

Analysis

These requirements allow the local-part to have approximately 94 possibilities for each of the inner characters and less for the first and last characters. Let’s take an upper bound of 94 possibilities for each character.

Using the permutation formula:

P(n,r) = n! / (n-r)!

With n being 94 and being running through all of the permutations for just the 64 characters local-part would give us:

P(94,64) = 94! / (94-64)! = 4.0 x 10^113 possible permutations

In order to brute force a local-part at only (assuming you knew the exact domain ending for all of the hashed emails) the max of 64 characters local-part (assuming the domain is correct which we will cover later) with an average of:

400 MHz/s tests (this depends on your hardware and algorithm used)

would take approximately:

3.25 x 10^97 years

This is not obtainable within anyone’s lifetime and this also does not even include the domain which would exponentially increase the time. So is how will we get this to work? The answer lies in human tendencies.

Human Tendencies

From leaked emails I was able to collect about 3 million unique emails for research purposes. Analyzing the data gives some interesting results about real word email address characteristics.

The first is the tendency of characters used in the local part.

The graph shows that mostly lower case letters and numbers are used as well as the symbols ._- . The other symbols and upper case letters are rarely used. Also emails are case insensitive so if you had cleansed the emails to make them all lowercase this would guarantee no upper case letters are needed.

This now reduces the possibilities for each character to 27.

The second is the tendency of local-part length.

Like many qualities about nature, emails also follow a normal distribution. The mean is around 9.

Assuming you have the email domains worked out:

1. A brute force of a local-part of up to 12 characters () will cover about 75% of all local-parts.

2. A brute force of a local-part of up to 20 characters will cover about 99% of all local parts. This is a lot less than the 64 character limit!

The third is the tendency of similar domains.

There are many domains and many email providers, but my analysis shown above, the top three domains:

1. gmail.com
2. yahoo.com
3. hotmail.com

Which make up about 43% of the total possible email domains.

This falls in exact the same order as: https://email-verify.my-addr.com/list-of-most-popular-email-domains.php (which they estimated about 53% of the total possible email domains).

Analysis Conclusion

By multiplying each length by the percentage of total emails by the top 3 domains and doing a rolling summation across the length of local-part frequencies the following graph is produced:

We can easily achieve 25% of hashed emails by brute forcing up to 10 characters for gmail.com, yahoo.com, and hotmail.com.

25%!? Well that is not all of the emails. Well I did not say it was easy to recover all of the emails, but a quarter of the emails can be recovered quite readily depending on your hardware.

How can we improve the recovery?

  1. Increase the number of characters brute forced — this is the more straight forward way, but also the most time consuming.
  2. Use more of the popular email domains for your application targeted demographic — this will achieve more in a shorter period of time assuming you guess the popular email domains, as there are still around 50% of domain addresses that fall outside of top three domains.

A good place to start is with the internet provider domains in the locality of your application’s target demographic.

hashcat

hashcat is an open source cross platform advanced password recovery utilizing GPUs, that can also can be used to recover any hashes including email hashes.

It is a command line tool that allows for powerful features like automatic performance tuning, integrated thermal watchdog, sessions with pause/resume/restore, time estimates, and 200+ hash types.

Installation For Windows and Linux

  1. Head to https://hashcat.net/hashcat/ and download the binaries.
  2. Extract the binaries
  3. Open up a command prompt or terminal inside of the extracted folder
  4. Test hashcat by running with:

Windows: .\hashcat64.exe (or .\hashcat32.exe) depending on your operating system
Linux: ./hashcat64.bin (or ./hashcat32.bin) depending on your operating system

Installation For Mac

  1. Install homebrew: https://brew.sh/
  2. brew install hashcat
  3. Test hashcat by running with: hashcat

Creating the mask file

For hashcat to understand our desired format of the email (to narrow our range of possibilities) we need to create a email.hcmask file.

The hashcat documentation on masking is not easy to understand so I will lay out the important parts here.

Each line of the file is a mask that hashcat will exhaust for only that exactly that mask shape.

This means if the mask is 4 characters long, it will run all 4 character possibilities (not 1, 2, or 3 character lengths) within the domain of possibilities you specify. We will have to create a new line for other character length possibilities.

Each of our lines will have the format:

?l?d,?l?d._-,?1?2?1@domain.com

The commas are the way to break up the line into sections. Let’s explore each of the three sections:

  1. ?l?d — This as a whole represents the variable ?1 used in the third section. ?l —is a variable for lowercase characters a through z. ?d — is a variable for digits 0 to 9. Overall this means that the variable ?1 has all of the lowercase characters a through z and all of the digits 0–9.
  2. ?l?d._- — This as a whole represents the variable ?2 in the third section. This has all of the characters mentioned in the first section, plus the three symbol characters ._- .
  3. ?1?2?1@domain.com — This is the format of the mask. Our local-part is three characters long and we have the domain of exactly domain.com . Because the domain is fixed this mask will exhaust all 3 character long possibilities of the local part, effectively reducing our overall possibilities from 14 characters without the mask to only 3 characters.

With these steps we can create the full file for all of the brute forced mask attacks up to 10 characters for the 3 most popular email domains:

Selecting the proper hash algorithm number

Check Hash modes in the options section of: https://hashcat.net/wiki/doku.php?id=hashcat#options. For this post we will use SHA256 with no salt or pass — 1400. This number will be used in the -m flag

Creating the hashed emails file

This section is created for completeness, but will depend on the system where the hashed emails are stored. hashcat will accept a .txt file with a hash on each line like the example below with SHA256 hashed emails:

Putting it all together

Now we can start hashcat from the command line with the hashcat command from before, but supplying it the proper arguments:

hashcat -a 3 -m 1400 -o ./recovered_emails.txt ./hashed_emails.txt ./email.hcmask

The -a flag represents the attack mode of brute force. The -o flag is the output file. Here we are using a new filename of recovered_emails.txt. This is convenient, but the recovered emails will also be available inside hashcat.potfile. This file can be easily read with any text editor.

Make sure you used the proper hashcat calling and the proper paths for each of the files.

The output will look something like this using our example hashes from above:

With results:

Results are listed in the format:

hash:result

In our case the result is an email and it found the correct emails of:

test1@gmail.com
test2@yahoo.com
test3@hotmail.com

All of the emails for the example used 5 character local-parts across the three most popular email domains. It took my Macbook Pro just over a minute to recovery these three emails.

The full recovery process up to 10 characters could take over a week. If you have powerful enough processing power you could continue going upward towards 20 characters.

Be sure to subscribe for more programming intricacies.

--

--