Methods of anonymous blockchain analysis: an overview

13 min readNov 15, 2019

In one of the previous articles, we have discussed the key principles underlying anonymous cryptocurrencies. Today we look at the problem of anonymization from the other side by giving an overview of the most frequently used methods for analyzing anonymous blockchains. We will leave aside the possibility of linking wallet addresses with their corresponding IP addresses at the p2p protocol level, since in this case, all usable techniques are pretty uniform, and instead focus on the analysis of blockchain transactions themselves. To understand this article, it’s enough to have a general idea of how popular cryptocurrencies work, so as to understand what the inputs and outputs of a transaction are.

In 2009, when bitcoin just came into existence, it was considered a means for transferring funds anonymously, since there were no known ways to associate the public address of the wallet with its owner. But these days have passed, and today various methods are known for analyzing payment graphs and identifying address pools of almost any trading facility, be it exchanges, mining pools, or darknet markets.

Almost all of these methods use the same approach: first, addresses are clustered by using simple heuristics, and then each cluster is empirically correlated with a particular transaction participant. You can read more about this in A Fistful of Bitcoins: Characterizing Payments Among Men with No Names.

Since transaction confidentiality is still not a priority for Bitcoin, demand began to appear for ways to keep payment graphs secret. The first response to this demand was the so-called Bitcoin tumblers, the services that commit transactions on behalf of several users in order to mix potentially identifiable funds with others. While such mixing helps protect privacy, using it requires relying on third party services which can not always be trusted. As a way to get around the risk, the well-known CoinJoin protocol was developed, which allowed for combining multiple Bitcoin payments from multiple spenders without relying on a centralized service.

Shortly after, new cryptocurrencies based on the idea began to appear, proclaiming anonymity their top priority. Currently, there are three of them listed on Coinmarketcap Top 100, namely Monero, Dash, and ZCash. The total capitalization of the three at the time of writing is about $1.8 billion. Let’s take a closer look at ways to analyze these blockchains.

CoinJoin

CoinJoin protocol was proposed by Greg Maxwell in 2013 as an alternative to existing mixers that would allow users to achieve the same effect without having to trust their Bitcoins to a third party. Basically, it allows the users to arrange mixing by themselves by combining their payments into a single transaction.

In the example above, Arnold Schwarzenegger and Barack Obama cooperate to make two payments, one of which is being sent to Charlie Sheen and another to Donald Trump. As the payments come as parts of a single transaction, it becomes harder to identify which of the parties, Arnold or Barak, finances Trump’s election campaign.

Unfortunately, while it is harder it is still not impossible. One of the first solutions allowing to de-anonymize CoinJoin transactions was a tool named CoinJoin sudoku, which is still considered effective. Its author suggested analyzing various combinations of inputs and outputs in order to identify groups within a transaction with equal amounts, assuming that they can correspond to one payment. In practice, it may look like this:

Source: https://www.coinjoinsudoku.com/advisory/

The authors of an article entitled “Privacy-Enhancing Overlays in Bitcoin” went even further, suggesting that an attacker who is actively involved in such transactions could just exclude the inputs and outputs related to himself, making it much easier to de-anonymize other participants.

Still, the main reason why CoinJoin isn’t considered secure by many users is the solution’s centralized nature. Participants have to use third-party services to negotiate transactions, and while such service won’t be able to steal the money, it may still compromise participants’ privacy by keeping logs of the transactions they negotiate.

Besides, users also have to take extra measures to reduce the risk of being de-anonymized. It is extremely important, for instance, not to use the same address more than once, because this will greatly facilitate the analysis of the transaction chain for attackers.

Despite all shortcomings, over the past year, the number of CoinJoin transactions has tripled and reached 4.09% of the network’s total volume (according to longhash.com), which indicates a growing demand for privacy in the Bitcoin network.

Dash

Dash, formerly known as DarkCoin, started as a fork of Bitcoin. Dash transactions are not anonymous by default, but the network offers a special feature named PrivateSend which allows users to protect their privacy. In short, PrivateSend is an improved version of CoinJoin — and, unfortunately, it still inherits some of the latter’s shortcomings.

There are two types of outputs in Dash, regular and private. The first is used in regular transactions, while the second is mixed when using the PrivateSend feature. The process of mixing transaction outputs on the network is known as a mixing session, or round. First, the outputs are broken down into discrete standard denominations such as 1.00001 DASH, 0.100001 DASH, etc., which are then relayed to one-time addresses that are controlled by the user. Then the sender’s wallet initiates a request to a masternode to make it aware that a user would like to mix a certain denomination of Dash coins. When enough participants who also wish to mix the same denomination of Dash coins connect to the masternode, it mixes up the outputs and instructs all three users’ wallets to pay the now-mixed output back to themselves.

Below is an example of a PrivateSend transaction that uses two groups of three addresses for payers and payees, respectively.

Unlike when using CoinJoin, the user does not have to make sure that his public key does not appear twice in one of such mixing rounds. However, the method is weakly protected against source tracking: one can simply track the denominations up to their initial outputs and then employ any of the standard methods suitable for analyzing Bitcoin transaction sources. It has been found empirically that if multiple outputs have one common “ancestor”, most likely this is the sender himself.

Breaking down the transaction to denominations makes Dash transactions relatively persistent to analysis with such tools as CoinJoin sudoku, but does not grant them full protection. Employing some additional heuristics can nullify this advantage completely.

The network’s ability to natively mix transactions allows Dash users not to rely on third-party services, but they still have to trust the masternodes they select for mixing. No one can guarantee that the owner of any given masternode does not keep logs that could be used to de-anonymize them.

As is the case with CoinJoin, a lack of liquidity could be an obstacle for users willing to mix their transactions. To prevent this, users are often advised to select several masternodes for the mixing round, but this approach only increases the risk of running into an untrustworthy masternode.

The Dash privacy model is currently heavily criticized on the Web. There are posts on Reddit on how someone managed to deanonymize part of PrivateSend transactions. In this post, for instance, the author describes how he was able to track to their source about 13% of transactions made in 15 days.

ZCash

ZCash came into existence in 2016 as an implementation of the ZeroCash protocol and quickly gained attention thanks to the use of zero-knowledge proofs, or zkSNARKs. ZCash supports not confidential transactions only: most of the transactions on the network are transparent and similar to bitcoin transactions.

To make a transaction confidential, a user has to use a special pool of protected addresses called “shielded pool”. That makes moving funds from one public address to another a three-stage process:

Shielding transaction. The funds are moved from a transparent address into the shielded pool. The blockchain registers the sender’s address and the balance sent. The shielded addresses involved and whether it was sent to one or two shielded addresses remain confidential.
Private transaction. When a transaction is made within a shielded pool, it still appears on the public blockchain, but the addresses and the transaction amount are encrypted and not publicly visible. It is only known that some funds “that have ever entered the pool” are being transferred.
Deshielding transaction. The funds are sent from the shielded pool to a transparent address. The sender is confidential, but the amount and the recipient’s address are publicly visible.

*Different types of Zcash transactions. Left to right: transparent transaction, shielding transaction, private transaction, deshielding transaction. (Source:* *An Empirical Analysis of Anonymity in Zcash*).

It is important to note that the network requires that all generated coins (block rewards) appear within the shielded pool, thereby effectively increasing the anonymity set for mixing transactions.

The empirical analysis of public transactions in ZCash (which at the beginning of 2018 amounted to 73% of the total number of transactions) is not particularly difficult, and the same methods as for bitcoin can be used. However, getting enough data on deshielding transactions turns out to be much more difficult. One of the first attempts to analyze secure transactions was the article An empirical analysis of anonymity in Zcash. The authors tried to apply several heuristics that are based on user behavior:

If a transaction spends two or more transparent outputs (regardless of whether the transaction is transparent, shielded or mixed), then most likely they are controlled by the same entity.
If a transparent transaction has only one recipient, then most likely all addresses where the funds are being sent from belong to the recipient himself.
Any deshielding transaction that equals 250.0001 ZEC (roughly a 100 block rewards) is made between the network’s founders.
If the deshielding transaction has over 100 output addresses, one of which belongs to a known mining pool, then the transaction is likely a mining withdrawal and all non-pool output addresses belong to miners.
If there is a pair of a shielding and a deshielding transaction carrying the same amount, and the latter happened within some small number of blocks after the former, then these transactions are linked, as shown in the paper titled “On the linkability of Zcash transactions”.

The illustrations below show some results that the authors were able to obtain using these heuristics:

*Shielded pool deposits and withdrawals identified using the 1st heuristics.*

*The number of coins transferred to the shielded pool by various groups of users.*

The addresses that have put more than 10,000 ZEC into the shielded pool over time, where the size of each circle is proportional to the value put into the pool. Green circles stand for miners, orange for founders, and purple for other participants.

While the authors successfully identified 65.6% of deshielding transactions, tracking the transactions within the shielded pool proved to be much more complicated.

In a recent paper entitled “Privacy and Linkability of Mining in Zcash” the researchers from the University of Luxembourg analyzed the performance of mining pools and identified two patterns related to paying rewards:

The mining pool sends the block reward to its public address and then distributes it to the miners. The authors called it Pattern T.
The mining pool keeps the mined coins within the shielded pool and at some point pays rewards to the miners using deshielding transactions. This pattern was called Pattern Z.

Linking transactions to specific pools was a relatively easy task: the researchers simply compared the public addresses used for paying rewards with the top miners’ addresses that are publicly available on the mining pools websites.

By doing this, the researchers were able to increase the share of identified payments from 65.6% to 84.1% using no additional data.

However, the approach has its drawbacks:

It works only with fairly narrow time interval of about 2000 blocks (which is about four days of transactions) since miners tend to switch between mining pools;
It is barely suitable for identifying transactions made by smaller pools that rarely get rewarded for more than a few blocks since their payout transactions do not fit into the general pattern typical for large pools due to the narrow range of the outputs involved.

The results of the analysis are shown below:

De-anonymized transactions linked to certain Zcash mining pools. Left to right: The pool’s name, the number of transactions tracked using both heuristics, mined value, and the share of successfully tracked transactions for each pool.

Based on these data, the authors conclude that 95.6% of all ZCash transactions are potentially “linkable”, which means that the actual level of privacy in ZCash is not much higher than that of Bitcoin.

It’s also notable that despite the strong cryptographic protection, a significant part of data that made the analysis possible was publicly available.

Monero

Monero, created on April 2014, is the most popular implementation of the CryptoNote protocol. In order to conceal the receiver’s public address, Monero uses one-time addresses, and there is no way for an outside observer to cryptographically match the one-time addresses with any public address. To protect the sender as well, the network employs mixing, but in a way that is different from CoinJoin. In Monero, the sender does not need to look for other participants to complete a joint transaction. Instead, the wallet itself creates a random set of outputs available on the blockchain, hides the real one among them, and signs the set with a ring signature. The ring signature here serves to prove the transaction validator that the sender actually owns one of the outputs being mixed, and no double-spending will happen. This makes tracking a payment to its recipient rather complicated. See the example below:

As a result, the privacy level of a transaction correlates with the size of its ring signature, as the more random outputs are used for mixing, the more difficult it is to track it down to the source.

Initially, mixing was optional for confidential transactions in the Monero network, but in 2016 the minimum size of the ring signature was established. At the time it was set at 3, but the network increased it every year. The current minimum size of the ring signature is 11.

In 2017, in a paper entitled “An Empirical Analysis of Traceability in the Monero Blockchain”, a group of researchers pointed out two vulnerabilities in the cryptocurrency protocol (admittedly they were not the first to notice them, but their study has proved the point by providing practical results of transaction analysis).

The first vulnerability was related to the downsides of using stealth addresses only without mixing, which is now impossible. The authors have shown in practice that relying on one-time addresses not only provides no protection to the senders but also presents a privacy hazard for other users, as Monero wallets occasionally use them for mixing in other transactions, ignoring the fact that the outputs could be spent.

As a result, when a transaction uses a small number of false outputs, the method of exclusion allows determining the real ones with a high degree of probability. By employing the method known as “chain reaction” analysis, the researchers were able to track about 62% of transactions made before February 2017.

The second vulnerability was related to the way wallets selected outputs for mixing at the time. They were just picked up from the blockchain evenly, which does not match the behavioral patterns of cryptocurrency users. The authors discovered that the user most often spends the amount received in a transaction within two to three months. That means that in most cases, the most recent outputs are already spent. The charts below show the difference in age distribution between real and fake outputs:

*Age distribution among the outputs ruled out as mixins*

*Estimated age distribution of real outputs*

Using this obvious mismatch as a heuristic, the authors managed to de-anonymize about 80% of senders in Monero transactions.

However, both heuristics are currently of academic interest mostly, since the developers have already fixed the underlying vulnerabilities. The minimum number of outputs in mixing has been raised to 11, the rules for selecting them were adjusted to fit the real distribution of the unspent outputs, and a new protocol of confidential transactions was implemented.

In another paper, entitled “New Empirical Traceability Analysis of CryptoNote-Style Blockchains”, researchers try to analyze Monero transactions using so-called “closed sets”.

Let’s explain the method with an example. Suppose we have four outputs: pk1, pk2, pk3, and pk4. Now we need to find four transactions that use these four outputs exclusively:

tx1.in = {pk1, pk2, pk3};

tx2.in = {pk2, pk3};

tx3.in = {pk1, pk3};

tx4.in = {pk1, pk2, pk3, pk4};

If succeeded, we can conclude that these four outputs are all spent in these four transactions, and the next time we meet a transaction where at least one of these outputs is being mixed, we can exclude it from the anonymity set as knowingly false.

The authors experimented on the version of the Monero blockchain that already supported confidential transactions and had a minimum size of the ring signature set at 5, which means the analyzed transactions were less protected than today. However, due to the rarity of such sets as in the example above, they were able to track only 0.084% of the outputs. Using closed sets is thus barely applicable to Monero, although it can still be considered an addition to other methods of analysis. This result allows us to conclude that modern Monero provides fairly strong confidentiality protections.

To sum it up

We conducted a fairly broad study of methods allowing to analyze anonymous blockchains. While their cryptography itself is admittedly strong and no one even tries to crack it, many of them have weaknesses that are not rooted in crypto at all. Some fall victim to the centralized nature of their anonymization services, for others, the weak point is the unwillingness to abandon the option of transparent transactions. Among the three blockchains analyzed today, we believe Monero protects users’ confidentiality best, as the previously found vulnerabilities are now fixed while recent studies have yet to provide any significant results on the blockchain’s traceability.