Exploiting the Profanity Flaw
On 20 Sep 2022, a tweet indicated that Wintermute had been hacked for ~$160M . One day after, in a follow-up tweet, the victim stated that the attack was linked to Profanity-type exploit. After some preliminary research, we were able to see that the victim’s EOA account 0x0000000fe6a514a32abdcdfcc076c85243de899b was compromised (i.e., the hacker controlled the private key of it). Moreover, the vanity address with seven leading zeros was generated by an open-sourced vanity address generator and the hacker cracked that generator based on the ideas described in 1inch’s blog .
We have done our own investigation of this hack. This blog post will elaborate on the method we used to exploit the Profanity bug. We will shed light on the hardware and time requirements to crack an address generated by Profanity. According to estimates by Anton from 1inch on an issue raised on Profanity’s GitHub, a set of 1,000 GPUs could theoretically brute force the private keys of every 7-character vanity address generated using Profanity within 50 days. We used a Macbook M1 with 16GB RAM to precompute a dataset in less than 10 hours — this dataset only needs to be computed once for exploiting different addresses. The actual process, not counting the precomputation, took about 40 minutes for one address with seven leading zeros. We finished the implementation and were able to crack the private key of 0x0000000fe6a514a32abdcdfcc076c85243de899b in less than 48 hours.
Profanity is an Ethereum vanity address generation tool. It generates addresses in parallel by exploiting GPU power with OpenCL through a simple algorithm. First, it creates a private key d randomly, calculates the public key Q, and then derives the Ethereum address. If the address matches the rules specified by the user (e.g., having N leading zeros), Profanity prints the private key and the Ethereum address. If not, Profanity adds 1 to the private key and derives a new Ethereum address until it finds the one that matches the rules. In Profanity, each iteration of generating a new Ethereum address with a new private key (i.e., adding 1 to the old private key) is called a “round” and the total iterations to reach the target is denoted as R. We will use “iteration” and “round” interchangeably in the following context.
Profanity leverages a mathematical property in elliptic curve cryptography. If a private key is added by one, the corresponding public key will be added by G, a publicly known constant point on the elliptic curve. Based on that, in each iteration, Profanity doesn’t have to do the math on private keys. Instead, it adds G to the public key of the previous iteration. If a target address is identified, Profanity can simply add the iteration count to the initial private key d0 to get the private key of it.
As mentioned above, profanity uses GPU to speed up the address generation process. In each OpenCL work item (you can think of it as one computing thread inside GPU), Profanity adds the index of the work item (retrieved by get_global_id OpenCL API) to the 4th quadword of the 256 bits private key. Specifically, it uses the highest 64 bits of the private key to track the thread index such that it won’t conflict with the round ID in the lowest 64 bits. For example, the initial private key (d1,0) of the second work item would be (d_0,0 + 2^192). The third work item would start from (d_0,0 + 2*2^192), and so on. Note that the total work item count defaults to 255*16384 in profanity. We denote this number as N in the following context.
To generate keys in parallel, Profanity first initiates the public keys(Q_0,0, Q1,0, Q2,0, …, QN-1,0) on all these 255*16384 computing work items. Then, it adds G on these public keys to derive N public keys in each round, in parallel.
The very first step of the whole process is where the vulnerability lies — the random key generation. To generate a random private key, Profanity first uses the random device (std::random_device in c++) to generate a seed. But sadly the seed is 32-bit, which cannot be used as a private key directly. For generating the 256-bit private key, profanity feeds the 32-bit seed into a pseudo-random number generator mt19937_64 which is a deterministic function. In brief, you get the seed, you get the private key. Since there are only 2^32 possible initial key pairs (d_0,0, Q_0,0) and the iteration on each round is reversible, it is possible to crack the private key from any public key generated by Profanity.
The cracking process
This is the process we followed.
- Get the public key Qi,j of the victim’s address;
- Precompute 2^32 public keys Q_0,0 with seeds from 1 to (2^32–1) as described above. Let’s denote this data set as Q_0,0[2^32];
- Compute the public keys (Qi,j, Qi-1,j, Qi-2,j, … , Q0,j, Q-1,j, …, Qi-N,j) in the current round (denoted as j) by subtracting (1<<192)G for N times.
4. Subtract all the public keys (Qi,j, Qi-1,j, Qi-2,j, … , Q0,j, Q-1,j, …, Qi-N,j) with G to get the public keys in the previous round (Qi,j-1, Qi-1,j-1, Qi-2,j-1, … , Q0,j-1, Q-1,j-1, …, Qi-N,j-1);
5. Check if any of the public keys generated in step 4 (Qi,j-1, Qi-1,j-1, Qi-2,j-1, … , Q0,j-1, Q-1,j-1, …, Qi-N,j-1) are in the precomputed data set Q_0,0[2^32].
6. If not found, go to step 4 to generate another round of N public keys;
7. If there’s a public key Qu,v matches Q_0,0[k]. The index in the data set k would be the seed. We can compute the target private key di,j from du,v which could be derived from k.
Below we describe the details of our implementation.
1) Get the public key of the victim’s address
The Ethereum address is a part of the keccak256 hash of the public key. There is no way to directly derive the public key from the address. But if the account owner happens to sign a message, we could recover the public key from the message and the signature. In Ethereum, every transaction sent from an account is signed by the account owner with its private key such that we can get an on-chain transaction of the victim’s account and recover the public key from there.
2) Precompute the public keys
To precompute the data set Q_0,0[2^32] with GPU, we implemented mt19937_64 in OpenCL and fed it with numbers from 0 to (2^32–1) to generate the 256-bit private key in each work item k. Then, we compute the corresponding public key Q_0,0[k]. Since one public key is 32 bytes in its compressed form with the one-byte prefix stripped, all 2^32 public keys require 128GB memory or disk storage. In our implementation, we only use the lower 8 bytes of the public key as the probability of collision of 8 bytes is low enough. This optimization not only lowers memory and disk usage to 32GB but also saves comparison time in later steps.
3) Compute the public keys of the same round
After we get the public key Qi,j of the victim’s address. We need to subtract (1<<192)G from it in each work item. This is done by reusing the OpenCL code in profanity. In Profanity, the generation process is to add (1<<192) to the private key and derive the public key using the equation Q=dG.
To compute (1<<192)G, we can input (1<<192) as the private key. Therefore, (1<<192)G is equivalent to the process of deriving the public key from d = (1<<192).
4) Compute the public keys of the previous round
We can also reuse the OpenCL code of profanity in this step. In profanity, it adds G to each public key to get to the next round. We can simply substitute all G in the related code to -G to implement the G-subtraction function. Specifically, when G is (Gx, Gy) in an elliptic curve, -G is (Gx, -Gy). So, we could get it done by replacing all Gy with -Gy in related Profanity code.
5) Compare the public keys with precomputed data
Our precomputed data is 32GB public keys and our task is to check if any public key generated in this round is one of the 2^32 pre-computed keys. At first, we tried to do memory comparisons inside the GPU. However, we didn’t have a graphics card with >32GB memory. In addition, we didn’t want to deal with multiple GPUs and make the problem complicated. We started using one GPU but split the data into eight 4GB parts and loaded them one by one for memory comparisons in each round. The memory exchanges between GPU and CPU soon slowed us down.
Later, our attention turned to the 16GB Macbook M1 which is the default equipment for Web3 developers/researchers at Amber Group (and probably many other companies/projects). Apple’s M1 has a unified memory design which makes the GPU and the CPU share the same memory space. This means no memory copy load between the GPU and CPU during the cracking process. We then started some tests using the GPU to do the memory comparison. While looping the 2^32 data to do the comparison, we noticed that the system didn’t allow us to consume that much GPU time. Our OpenCL program was interrupted and killed before completing the memory comparison task.
We realized that the problem we were dealing with was actually a search problem. So…no need to bother with the GPU. We could do a binary search with the CPU. To implement the binary search, we sorted the pre-computed data on disk using quick sort, which took us about 2 hours. Then, in each round, we binary-searched the newly generated public keys one by one in the sorted dataset. Still unable to put the entire sorted dataset into 16GB memory, we mmap()’ed one-eighth of the data file eight times to complete each round. To further reduce the memory copy overhead, we sorted the candidate public keys as well. Eventually, we had only eight 4GB memory loads during each round.
This can be done more effectively. One could batch-process the memory comparison part for different rounds. That would reduce the data load count from the disk dramatically. For example, when the batch size is 50 rounds, only 2*8 disk loads are needed for 100 rounds. We only used one CPU core to do the binary search. Such a “reading memory” workload could be simply optimized by applying multicores on it.
6) Reconstructing the private key
If any of the public keys computed in the current round matches the precomputed public keys, we’d be able to reconstruct the target private key. As mentioned earlier, we could use the index of the precomputed public key as the seed to derive the private key d_o,o with the mt19937_64 random generator. With the number of rounds of reverse processing (R) and the ID of the work item (w) which generated the matched public key, we could reconstruct the target private key by d_o,o + (1<<192)*w + R.
In this blog, we have demonstrated how a Profanity-generated account could be exploited with mid-end consumer-level hardware. As well documented by this point — your funds are not safe if your address was generated by Profanity.
Cryptography constitutes the foundation of blockchains. By reproducing hacks and exploits, we can build a better understanding of the attack surface spectrum across Web3. Better collective awareness of various patterns of hacks, flaws, and vulnerabilities hopefully contributes to a stronger and more attack-resistant future. Always manage your private keys with caution. Don’t trust, verify.
The information contained in this post (the “Information”) has been prepared solely for informational purposes, is in summary form, and does not purport to be complete. The Information is not, and is not intended to be, an offer to sell, or a solicitation of an offer to purchase, any securities. The Information does not provide and should not be treated as giving investment advice. The Information does not take into account specific investment objectives, financial situation or the particular needs of any prospective investor. No representation or warranty is made, expressed or implied, with respect to the fairness, correctness, accuracy, reasonableness or completeness of the Information. We do not undertake to update the Information. It should not be regarded by prospective investors as a substitute for the exercise of their own judgment or research. Prospective investors should consult with their own legal, regulatory, tax, business, investment, financial and accounting advisers to the extent that they deem it necessary, and make any investment decisions based upon their own judgment and advice from such advisers as they deem necessary and not upon any view expressed herein.