Ten Million Passwords FAQ

Mark Burnett

Published in

XATO

4 min readFeb 10, 2015

In response to my recent release of 10 million passwords, I thought I would address some of the questions I am getting.

Where are the passwords from?

These are old passwords that have already been released to the public; none of these passwords are new leaks. They all are or were at one time completely available to anyone in an uncracked format. I have not included passwords that required cracking, payment, exclusive forum access, or anything else not available to the general public. You should still be able to find a large number of these passwords via a Google search.

The passwords were compiled by taking samples from thousands of password dumps, mostly from the last five years although it also includes much older data. I wanted to mix data from multiple sources to normalize inconsistencies and skewed data due to the type of web site, it’s users, and it’s security policies. (see this article for problems with password data)

The size of the samples from each site were determined by the data itself. Since the top 100 passwords have been very consistent over the last 20 years, I was able to use that to determine the quality of the source data. Some dumps contained so much bad data that I had to limit how much of it I included.

What is bad data?

Here are some examples of bad password data: http://pastebin.com/v6HCVDHN

How was the data collected?

I have been collecting passwords for about 15 years. In the past I have used a number of scripts to scrape the web, forums, IRC, Usenet, and P2P sources to get even 1,000 new passwords per day. In fact, it took me almost 10 years to collect just 6 million unique username/password combos (and at the time I thought that was huge).

However, in the last 5 years things have changed tremendously. I am now able to manually collect 10–20 million unique passwords per year simply from paste sites and forums. There have also been a number of very large password dumps with tens of millions of passwords in a single dump. Anyone could easily gather several hundreds of millions of passwords without much effort.

Why did you release this data?

The primary purpose is to get good, clean, and consistent data out in the world so others can find new ways to explore and gain knowledge from it. The data isn’t perfect and there are a few anomalies, but it should provide good insight into user password selection.

Really, why did you release this data?

I’m a bit obsessed with passwords.

Won’t this help hackers?

If a hacker needs this list to hack someone, they probably aren’t much of a threat.

What should I do if my password is on the list?

If your password is on this list that means it has already been publicly available for some time. You should change your password and enable two-factor authentication if available. Several of my own passwords are on the list as well, I left them there because they are already many places on the web.

What if my password is not on the list?

It doesn’t mean you are safe. This is a tiny sample of the hundreds of millions of accounts that have been publicly dumped and doesn’t even include the hundreds of millions more that have never been made public.

Is this unethical to release these passwords?

Although I have justified the release of these passwords, I have to admit it is at least close to the line. I have considered releasing this data for a number of years and have put much thought into the ethics involved; it is not something I take lightly. I could have replaced all the usernames with random numbers or hashes, but I felt like the usernames just had to be included. I did make sure to remove domain names from email addresses and other identifiers so that they couldn’t be directly linked to specific accounts. I also aggregated data from many sources so that this data could not be used to target any particular site. The thing to remember here though is that I am not releasing this data, I have just aggregated and cleaned up already public data.

How can I monitor my accounts to know if they have been leaked?

I would suggest the following:

Create a Google alert for your email address, username, and domain if you have one.
Create a Pastebin account and set alerts for your email address, username, and domain if you have one.
Sign up for account monitoring at haveibeenpwned.com, pwnedlist.com, breachalarm.com, canary.pw, or a similar site (feel free to add similar sites in the comments if you know of others).

Can I have your raw data?

No. Actually I have shared portions this data with companies who notify users of account leaks. Over the years I have gotten pretty good at finding passwords that others miss. But if you just want the raw data for any purpose than protecting users the answer is no.