Cracking My First Password
My daughter’s employer emailed her a tax form as an encrypted PDF file. The email read, “The password to open this document is your date of birth in this format MMDDYYYY and the last 4 digits of your SS number.” I opened the document and entered the password as indicated.
But it didn’t work.
I double checked my daughter’s social security number and her birth date (sometimes we get those things mixed up). I tried things like putting the last four of her social first followed by the formatted date. I tried formatting the date in a dozen different ways. I tried transposing a few numbers in her SSN. Nothing worked.
I emailed her employer:
We cannot open <daughter>’s form using MMDDYYYY followed immediately by the last 4 digits of her SS number. Have you had any reports of others who cannot open theirs?
No, I have not heard that anyone else has had this issue. I can print and mail or she can come pick it up, whichever is more convenient.
For the past six years, I’ve gone completely paperless with our taxes. I store all our tax forms digitally, organized by year. I greatly prefer managing digital documents over paper ones, which accumulate over the years and need to be (physically) stored, organized, and moved.
I could, of course, wait for the document to arrive in the mail, and then scan it, but what’s the fun in that? So I took to Google:
To “brute force” a password means to try all possible combinations of characters until you finally guess the correct password. In my case, I was confident that the password to my daughter’s tax form consisted of 12 digits. In this age of computing, that can’t be too hard to crack, can it?
The first step was to extract the password hash from the PDF document. If “hash” makes you think of “hash browns,” you’re not too far from the truth. Password hashes are even “salted” to make them more difficult to crack.
To make hash browned potatoes, you need to grate the potatoes into little shreds. Let’s call this grating tool a “hashing algorithm.” To continue the analogy, in order to crack our password, we need to feed potato after potato through the hashing algorithm until we find one that comes out exactly like the original hash. As you can imagine, that takes a lot of potatoes.
Are you hungry yet?
Me too. Here’s my recipe:
First, download and install Perl. Perl is a super-geeky programming language that I wish I knew. I’m on Microsoft Windows (hey, be nice) so I tried the ActiveState and Strawberry flavors of Perl. They both work, the critical part is making sure that Windows associates the .pl file extension with Perl.
Next, I downloaded the GitHub repo for John the Ripper. John the Ripper (henceforth “JtR”) is another geek tool with a really long history. It’s main purpose is to grate our potatoes into hashes as fast as possible until we get a match. I downloaded the Windows build and unzipped it. It requires no installation.
I located the “run” folder inside of the JtR directory, and copied the PDF file (“TaxForm.pdf”) I was trying to crack into it. I was preparing my workspace.
Then I opened Windows Command Prompt and navigated to the JtR “run” directory, where there are myriad, no — a plethora — of Perl and Python scripts we can use to extract password hashes from all different types of files: 7z2john.pl, cisco2john.pl, itunes_backup2john.pl, etc. The one I needed was called pdf2john.pl.
pdf2john.pl TaxForm.pdf output the hash from the PDF file on the screen. But JtR needs that hash in a text file. So I redirected the output to a text file by adding
I was almost ready to start cracking. I just needed to figure out how to tell JtR to only try combinations of numbers 0–9 that were 12 characters in length. The option to tell JtR to only use numbers is
--incremental=digits, but specifying a length of 12 characters required editing the john.conf file (conveniently located in the “run” directory).
Normal Windows notepad won’t detect the line breaks in the john.conf file, so I opened it with Notepad++. It’s a large file, but I found the parameters I was looking for around line 1210. I set
MinLen = 12 and
MaxLen = 12.
I fired up JtR with
john --incremental=digits hashfile.txt
JtR starts running— on my machine — over 50,000 combinations per second through its hashing algorithm, trying to find a hash the matches the one we extracted from TaxForm.pdf. JtR doesn’t display much output on the screen, but it will keep running in the background until Ctrl-C is pressed. For more details about what it’s doing, I needed to repeatedly check the john.log file:
I let it run all night long and into the next day. Then it occurred to me: there are 900 billion different combinations of 12 digits. At 50,000 attempts per second, it would take over 6 months to try them all. Of course, I might get lucky and find the right hash after only a few weeks, but that’s still a long time. Not really worth it if my daughter’s employer can just snail mail us the form in a few days.
[I should note that my machine has a Core i5 processor. I’ve heard of people who are able to daisy-chain a bunch of PlayStation processors together. They can probably speed up this process over 100x faster than my little CPU can.]
I started to wonder if there was a better way.
Another way JtR can crack passwords is by dictionary attack. Instead of JtR trying all possible random combinations of characters, we supply it with a list of pre-generated passwords that it can try.
In the pattern of [date of birth in MMDDYYYY + last four digits of SSN], I figured there were only about six unique digits used, and I could cut in down to five if I assume that the first three (the MMD part) were correct. So I pieced together the following Powershell code to generate a list of all possible permutations of the final nine digits, prefixed by the first three that I hoped were correct.
This script took about 15 minutes to complete, but when it was finished I had a list of over 600,000 permutations I could use to try to crack the hash of TaxForm.pdf’s password.
I copied my list.txt file to JtR’s run directory, and tried
john --wordlist=list.txt hashfile.txt
JtR easily found the password in just a few seconds. It was displayed on my screen as shown above, but that may have been because I told it to in some further tweaking I did to the john.conf file. In any case, JtR stores passwords it cracks in a file called john.pot.
I discovered that my daughter’s employer had her social security number wrong. It was even incorrect on the tax form that I now was finally able to open. I was fortunate that they had it wrong in such a way that it was caught by one of the limited permutations I had in my wordlist.
A final note: for dictionary attacks, JtR was quite particular about the encoding of the wordlist file I gave it. It works best when the file is encoded as UTF-8.