The Password Defense League
Password theft plagues the internet. A new technique called Blind Hashing is set to change that.
How are passwords protected today?
To date, industry best practice is to secure passwords using a tunable hashing algorithm; pick the right hashing algorithm, tune its cost factors so it runs slowly and makes optimal use of your hardware, and it’s possible to protect very strong passwords from being cracked. Hashing imposes a runtime cost for each guess at a password. Slower, expensive hashing translates into higher latency for each login attempt, and a higher cost for an attacker trying to guess the password.
But hashing can’t protect weak passwords
When passwords have high entropy—meaning they are truly hard to guess—then the cost of running the hashing function so many times can become prohibitive. But when the average password is trivially easy to guess, then the cost of hashing also becomes trivial.
Users tend to choose memorable passwords which on average are much too weak to withstand these attacks. One possible defense is imposing a complex password policy; but this has a huge usability impact on the user, it may have unexpected side effects, and often users will find a way to defeat the policy, so it doesn't actually make the passwords much more difficult to crack.
And sometimes features can be turned into weaknesses
Hashes are designed to be small, often just 32–64 bytes each, so it’s very hard to detect if they are being moved over the network, and large volumes of hashes can be moved very quickly. This is technically a feature, but it makes hashes far too easy to steal.
Easy to steal and easy to crack make a bad combination. We now count the number of stolen passwords in the billions; soon more passwords will have been stolen than there are people to imagine them.
Breaches are a personal attack against users
When a site is compromised, it’s end users who bear the brunt of the attack. The damage can run far beyond the inconvenience of resetting passwords. Passwords grant access to an ever-increasing trove of emails, documents, photos, and sensitive financial data. Since users tend to reuse the same password across sites, the fallout from a breach can be widespread and carry deeply personal consequences.
Companies have a duty of care to keep their users’ passwords secure, but the tools at hand to defend passwords have simply proven insufficient.
We need more security
One way to improve password security is to make the hashing algorithms better. That means creating algorithms which are easier to use, and easier to tune, taking the best advantage of the defenders hardware, and making it harder for an attacker to gain an unfair advantage. This is a well-known problem, as evidenced by the Password Hashing Competition (PHC) that is going on now. There have been some exciting submissions, and really interesting discussion on the mailing list. But the problem remains. Improved hashing functions are crucial, but it’s hard to change the core problem of easily cracking weak passwords when almost all passwords are weak.
We need to protect weak passwords equally well as strong. We need to be able to stop attackers before they can crack a single password. We need to prevent even targeted attacks—where attackers focus all their resources on cracking just a single password.
The Solution: Blind Hashing
Since 2012, TapLink has been developing Blind Hashing, a solution that entangles password hashes with a massively large block of data.
With Blind Hashing, you start by applying the hashing method of your choice, and you gain all the protection that hashing method has to offer. But then instead of storing the salt and hash together in the user database where it could be stolen and attacked offline, Blind Hashing does something a bit different.
The Data Pool
We start by creating a large pool of securely generated random data. This data is stored in racks of solid-state drives, in secure data centers, replicated across several geographic locations, and backed up offline on encrypted tape. The data pool is large enough that simply trying to transfer the entire pool over the network would take years at full line rate.
Next, we publish a single API call, which allows any site to entangle their password hashes with this massive data pool. Now, even if an attacker could steal a site’s hashes, they would have to also steal nearly the entire data pool, potentially petabytes of data, before they could even start trying to crack a single password.
The Request Flow
A password hash can be thought of as a pseudo-random positive integer value. We can use the password hash as an index into the data pool, so that each guess at the password requires access to a different location in the pool. We make a small read (e.g. 64 bytes) from that location, and return the result. The result is used as a key, or salt, to perform further hashing on the password, before saving it to the database (when enrolling a new password), or checking if it matches the saved value (when verifying an existing password).
To step through the figure above, first the user submits user and pass. On your site’s server, the username is used to retrieve Salt1 and Hash2 from your database. First you perform some hashing with Salt1 and the password to calculate H1, just as usual, using the hash function of your choice.
But instead of storing H1 in your database directly, you perform BlindHash on H1, which returns back Salt2. The Blind Hashing API really is simply a single function which takes 64-bytes in and gives 64-bytes back.
You then use the returned Salt2 instead of your own Salt1 to perform additional hashing to get H2, which is finally compared to the stored Hash2 from your database. A match means the password was correct and the user should be logged in.
In this manner, your hashes are blinded by the data pool such that each attempt to store or verify a password would need to access the same block from the data pool to complete. An attacker would need to query the data pool for each guess at a single user’s password.
Security by Obesity
With typical password hashing, each user has his or her own salted hash that can be individually stolen and attacked. The amount of data that must be stolen is miniscule, and high-value users can be targeted by stealing just a few bytes.
But with Blind Hashing, because such a massive amount of data must be stolen before any passwords can be cracked, it becomes much easier to defend against such attacks. A single pseudo-random read into the data pool means that, if an attacker can steal 10% of the overall data pool, then only 10% of guesses can even be tested offline, while 90% of guesses would fail to complete calculating the hash because the required data would not be found.
Blind Hashing uses a hash function to expand each individual read request into multiple independent pseudo-random reads, perform all the reads concurrently, and then combine the individual reads together using another hash so that the entire set of reads must succeed in order to get the result.
The effect is that any missing data is exponentially more likely to stop an offline attack. For example, imagine an attacker was able to steal “only” 8TB out of a 16TB data pool (50% of the total pool). By expanding each blind hashing request into 64 independent data pool reads, now the attacker would only be able to complete the calculation with a probability of (.5)^64, or 5.42 x 10–20. Trying to calculate a hash with half of the data pool is like flipping a coin 64 times and needing to get heads every time. Even if an attacker could steal 80% of the data pool, with 64 lookups they would still only be able to check less than 1 in a million guesses.
Shared Data Pools
Maintaining an exceptionally large data pool would be too costly for most sites, but TapLink’s solution makes it possible to share a single massive data pool across any number of sites, and each site gains the full security benefit of Blind Hashing.
If two sites directly shared the same data pool, one site could attack another by simply using the service to run their attack (effectively an online attack). So we assign each site a private key, and we use the key to transform data as it’s read from the data pool into a virtual private data pool for each site.
This way, the master data on disk never actually leaves the data pool servers, and each site is operating on their own private data. Each site is given a 64-byte random number, called an AppID, which identifies the virtual private data pool to use for their Blind Hashing requests.
TapLink Blind Hashing can stop password theft on the internet
Imagine you could join a Password Defense League where your users’ passwords are immediately secured by the total combined resources of sites all across the internet. Blind Hashing allows sites to effectively pool their resources into a common defense fund for securing passwords. Together we can keep even the weakest passwords incredibly secure against offline attacks.
The design is cryptographically trust-less, which means Blind Hashing doesn't take over your authentication process, and you stay in complete control over your user data. Blind Hashing respects your autonomy, and your users’ privacy. Blind Hashing doesn’t disclose any information about your users, not even their username. We simply keep a count of how many API requests are coming in, and a latency histogram so we know when to add capacity.
We’ve been in private beta for over a year, and we’re currently running a 16TB data pool on our own metal, with complete redundancy across two data centers.
We are actively working on making Blind Hashing easy to add-in to your existing process, easy to verify, and easy roll out into production. When it’s ready for public release, Blind Hashing will be a click-to-install plugin for every major platform, and an open source package for every major language. We want Blind Hashing to be as simple to use as checking a box or adding a line to a config file.
We are looking for security professionals, developers, researchers, and early adopters to learn about Blind Hashing, evaluate our security claims, and gain access to the API to actually start using Blind Hashing.