How to determine if your mayor ever opened up an Ashley Madison account

Why would one do this?

Before we get into the details of grepping the files, it’s worth noting the following: A recent (July 2019) study showed the use of Ashley Madison correlates highly with professional misconduct.

A quick quote from the above paper :

We have discussed the use of the data with many people, including attorneys, who confirm that the data are permissible to use for research purposes because the data are now in the public domain and available for research use in the same way that they are available to and used by the press. We believe it is also ethical to use the data, and the use of hacked data has become common both by the press and in academia.

I’ll leave the philosophy alone, apart from stating the obvious: In democracy, it’s best to minimize the likelihood that the political leaders are not prone to dishonest, corrupt or self-serving behaviors. So an extra data point can be helpful.

But doesn’t everybody already know? Why would I dig through myself?

There’s a lot of accounts that were opened anonymously and don’t have a name associated to them. Much of the names that got out where associated to either the credit card transactions (which required a name) or the users that were imprudent enough to use their regular work email address to set up the account. Many users were smart enough to create a single-purpose yahoo, hotmail, or gmail account. So there isn’t a quick and easy map from the users to real people. However, if you are looking for an individual person, you can with some reasonable confidence determine if they created an account or not. Many accounts have birthday, zipcode, city, state, geo-coordinates, height, weight, and a security question/answer. If you believe the person you are interested in answers these questions truthfully (most of us do — remembering fake birthdays or high school names is hard) you can make a judgment as to likelihood they opened an account. If a zipcode has only a few thousand residents, the birthday/zipcode combination could be unique. Further, if the security question is a match, or the height/weigh is unusual and a match, the probability leaps quite a bit. On the other hand, if the person of interest does not have a birthday/zipcode match, you can guess they probably never opened an account. What this post will do is show you how to search this data with more of a fine-tooth comb using some python commands.

Step 1: Get the files

To do this you have to use Torrent. If you’ve been doing this to avoid paying HBO the $15/month to watch Game of Thrones, you know what to do: The Torrent you want is “the complete Ashley Madison dump from the impact team”. There’s some later dumps, CEO emails and other goodies, but for these purposes we just need the first dump. The files will have names like “am_am.dump, aminno_member_email.dump” etc. For today’s exercises we will only be needing these two files.

Now if you haven’t Torrented before — here’s a quick introduction. ‘Tor’ is an abbreviation of The Onion Router. Basically, you connect to a global network of computer that either wish to be anonymous or simply want to help others be anonymous, and the web traffic bounces around until it gets to the right destination, in the process masking your i.p. address. The TV name for this is “The Dark Web” but it’s not much to be frightened of, as long as you stay on task and don’t have a weakness for ads promising illicit pornographic materials or a secret penchant for white nationalism. You can download and install the Tor Browser from a reputable source, and you can download and install BitTorrent or some transmission client from a reputable source, and then just make sure not to go down any unnecessary rabbitholes when on the browser. You shouldn’t need to open an account anywhere, and you shouldn’t need to give any personal information.

So once you install Tor Browser and the Torrent client, you can search for the torrent a number of ways. I went to DuckDuckGo, searched for a torrent search engine, and then clicked on some and used these to search for Ashley Madison dump. There was a lot of garbage to sort through, but eventually I found it. You can download the torrent file (which is a relatively small file) and then open the Torrent in your client. It took me about an hour and a half to download: The estimate speeds up significantly as more peers join in.

Once the files are on your computer, unzip them into a directory.

On a side note, note that the files all comes with accompanying hash files, and the PGP key for the hackers (Impact Team) is in the ReadMe and also is well-known. This is to prevent someone from modifying the data, inserting their own, and trying to recirculate the data.

Step 2 : Python

Now the files are huge, and are some sort of raw SQL data. It’s not so easy to start running SQL on your computer and start doing queries. What I go through below allows you to access everything even if you don’t have 64 GB or RAM. Basically, we just start reading a few characters from the file, smaller blocks at the time, and the do some quick tricks to find certain types of data we’re looking for. If you do have plenty of RAM there might be better ways to do this. But you should be able to do it on most computers.

I’ll assume the reader has limited knowledge of python and explain some of the steps.

We will use the pandas package and work inside a python terminal. If you have python installed, just type “python” at a command line in the directory where you saved the files.

Note that the formatting (in particular the tabs) is all off below. So I created a github repo containing the code.

First

import pandas as pd

This loads the pandas package. Now open the file:

f = open(“am_am.dump”, “r”)

You’ve just told python you intend to read (“r”) this file, and have given it handle.

Next, we need to pick a number that is small enough to not crash your memory. I went with 1 Billion. This is how many characters python will read at a time. If you start getting “memory error” at any point, start over and choose a different ‘N’.

N = 1000000000

Next, we want to read N characters into a string, which we call ‘read’

read = f.read(N)

This gives us a string, and we can access any of the characters in the string by position. For example, to see the first 2000 characters, simply write:

read[0:2000]

Since I’ve already taken a look, I can tell you where the table header data is. The code below reads the header data, and opens up a pandas data frame with the correct header information.

headers = read[1003:3140]
headerslist = headers.split(‘\n’)[1:-4]
headerslist = [h.split(“`”)[1] for h in headerslist]
df = pd.DataFrame(columns = headerslist) #48 columns data frame

The super-useful function here is ‘split’. You can take any string and run the split function on it, and it will split the string into a list of strings. If you’re new to python and want to play around and get your hands dirty with this sort of data, I recommend spending a couple minutes digesting online resources about how to use this.

Next, we need to define a quick and dirty function that crams data that doesn’t split very well into 48 columns. It’s not production quality code, just something quick.

def comma_cram(ls):
\t k = len(ls)
\t ts = ls
\t for i in range(k-22): ts.pop(13)
\t ts[13]=ts[13]+’TRUNCATED’
\t return(ts)

(Again, I’m not sure why Medium doesn’t let me indent … replace the \t with spacing so that there are 4 spaces before where the line starts… I’m also uploading this to github)

birthday = ‘1962–08–31’

Probably the best information to search for is birthday. So here is where you put in the birthday you are interested in. I put in Ted Wheeler’s birthday, just for fun.

fail_string = [‘\n’]

while(len(read)>100):
s = read.split(birthday)
for i in range(len(s)-1):
try:
back = s[i][-1000:].split(‘),(‘)[-1]
forward = s[i+1][:3000].split(‘),(‘)[0]
backcomma = back.split(“,”)
forwardcomma = forward.split(“,”)
if len(backcomma)>27: backcomma=comma_cram(backcomma)
if len(forwardcomma)>22: forwardcomma=comma_cram(forwardcomma)
fullline = backcomma[:-1]+[birthday] + forwardcomma[1:]
df.loc[len(df)] = fullline
print fullline
except:
back = s[i][-500:]
forward = s[i+1][:500]
fail_string+= [back] + [‘ ‘] + [birthday] +[‘ ‘]+[forward]+[‘\n’]
print “failed …”, back, forward
read = f.read(N)

Again, formatting is all wrong, see the github file.

Next, we save the data to a file.

df.to_csv(“look.csv”)
fails = open(“fail.txt”, “w”)
fails.write(‘ ‘.join(fail_string))
fails.close()

Since some of the data didn’t go smoothly, I piped this into a raw file “fail.txt” that you can examine by hand if you didn’t find what you expected.

In the example with Ted Wheeler’s birthday, this produced about 7–8 misreads, and 838 good entries in the csv file. Since this is about 1% of the data I’m going to just ignore it. You can open the csv file “look.csv” and take a look. You can sort by any of the headers.

However, we can always do things quickly with pandas if we know how. For example, since we want to know if Ted Wheeler opened an Ashley Madison account, our next step is to look for Oregon residents. Having looked at this data, I know the state code for Oregon is ‘38’. (Note that this is a string, not an integer)

So we run the basic command

oregondf= df[df[‘state’] ==’38']

We can see how many entries we got

len(oregondf)

The result was 2, so there’s two AM accounts registered for Oregon residents with Ted Wheeler’s birthday.

Next, we get emails (this takes about 20 secs per email on my machine, so not worth doing for the whole 838, but OK for the 2.) This requires the email file to be in the directory.

(Again the following code is not formatted well.)

def get_emails(pnum_list):
ef = open(“aminno_member_email.dump”, “r”)
eread = ef.read()
e_list = []
for pnum in pnum_list:
print pnum
tt = eread.split(“,(“+pnum)
e_list+=[tt[1].split(‘,’)[1]]
print tt[1].split(‘,’)[1]
return(e_list)

Now we get the emails and write this to a csv file

email_list = get_emails(list(oregondf[‘id’]))
oregondf[‘email’] = email_list
oregondf.to_csv(“oregon.csv”)

Step 3: Make the determination

Now inspecting the file “oregon.csv”, there’s two entries. One’s a yahoo address that could be a real name, but that name isn’t Ted Wheeler. The other is a hotmail address. Both have Portland listed as the city.

However, the heights listed are 173 and 168 cm, respectively. Which is 5'8" and 5'6". I’ve never met Ted Wheeler, but I think from pictures I’ve seen, he looks only an inch or so shorter than Obama, so I’m going to say these profiles don’t belong to him. Also, roughly 650,000 people in Portland Metro area.. using data from https://www.indexmundi.com/united_states/demographics_profile.html we see about 13% of population is age 55–64.. divide that by the 10 years in that decade to get about 1.3% per year for people in that age group… divide by 2 to get the men, and we guess that about 4250 Portland men were born in 1962. Divide that by 365, roughly we would expect a dozen or so men with Ted Wheeler’s exact birthday living in Portland. So even if the height was a match, this doesn’t indicate anything especially suggestive about Ted Wheeler.

In fact, we can run a quick comparison — look at the data for a day earlier, August 30, and see we get 3 Portland hits. So 2 is not an unusually large number for a birthday in 1962.

Now the real smoking gun could be the security question. There’s 4 options 1)Mother’s maiden name 2)High school 3)favorite high school and 4) last 4 of SSN

One of the individuals used the last 4 of his social security number. It’s 7066. Now if Ted Wheeler’s last 4 of social security number is 7066, I would say he has some explaining to do. I don’t know his social and don’t know if it’s available. Unless someone tells me that 7066 is his last 4, my determination, is “highly unlikely.”

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Medium Trial Produces approx. 1000 views and 400 reads

5 Cyberthreat Tips for Business Email — Individuals keep reading too!

iOS 14: the “death of IDFA” and the effect for ad fraud detection

🎉AMA: Ark Rivals 🤝 CryptoScreen 🎉

Fives and Twenty-fives: a cyber perspective

How to Hack into Someone’s Cell Phone

🗣Calling all #crypto detectives! 🔍

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Bovine Data Consulting

Bovine Data Consulting

More from Medium

Write data to excel if the condition is met

Gaps in the market? Learn how to start a business through them.

Transparency in REPE — Letter #2 Industrial value-add explained

How to use WACC for IRR Analysis?