Testing the claims of combinatorial anonymity
Privacy is a key consideration among cryptocurrency users, and while some cryptocurrencies (like Monero) have prioritised transaction privacy into the coin’s design, other coins that aren’t inherently designed to be private can make use of transaction frameworks to anonymise the sender and recipient of a transaction.
One framework proposed in 2013 by Greg Maxwell called “CoinJoin” attempts to anonymise Bitcoin transactions by ‘pooling’ many transactions together. Since all an observer can see is a list of many inputs to the transaction (on the left) and another list of many outputs of the transaction (on the right), it is mathematically very difficult to determine which inputs balance with which outputs. CoinJoin is explained well in this article.
CoinJoin in its most basic form has been shown to be ‘crackable’ in that in some cases it is possible to ‘puzzle’ together which inputs are associated with which outputs. To circumvent this, some CoinJoin protocols require that all participants in the transaction send exactly the same amount of coin. By doing so, it is impossible to decipher which inputs are associated with which outputs by the transaction values alone since the recipient of the transaction could be any one of the recipients with the same payment value.
This protocol is very anonymous (with sufficiently many participants) however a disadvantage of it is that it requires users to send the exact same amount of money. In the case of cryptocurrency where payments can be made to 8 decimal places of precision, this limits the usage of the protocol to significantly.
A newer CoinJoin protocol called CashFusion doesn’t enforce the requirement of equal payment amounts and instead purports anonymity due to the high number of participants in a transaction. With sufficiently many participants in a transaction, the CashFusion developers say that:
- Due to the high number of inputs and outputs, it is prohibitively difficult to try to ‘puzzle together’ inputs that sum to specific combinations of output values.
- Even if someone could puzzle together some inputs and outputs to exactly balance, due to the immense number of combinations of inputs and outputs that feasibly could balance, the original combination is incredibly unlikely to be unique.
But some in the cryptocurrency community are sceptical of these claims. I set out to try to crack CashFusion by linking inputs and outputs together and then proving that they had to be the true transaction.
Let’s start with the CashFusion developers’ first claim, can we find some subset of inputs to a CashFusion transaction that exactly add to some other subset of outputs?
When it was demonstrated to be possible to perform this for CoinJoin transactions in this blog, the author relied on his computer hardware to iterate through thousands of combinations of inputs and outputs, and manually checked if each combination of inputs and outputs balanced. While this may have been possible with relatively few inputs and outputs of a CoinJoin, it certainly isn’t possible for a typical CashFusion transaction which typically have at least 50 inputs and 50 outputs. So we need a way to find balanced subsets in a far more efficient way than blindly iterating through combinations of inputs and outputs and checking whether they balance. The good news is, the problem of finding balanced subsets of inputs and outputs can be formulated as an integer program and then passed to an integer programming solver, and doing so is far more efficient since solvers use more efficient algorithms to search for solutions than just iterating through potential combinations.
Once the problem of finding balanced inputs and outputs is formulated as an integer program, the solver finds balanced subsets easily (to within some matching tolerance). For the example CashFusion transaction given here, here is an example of balanced inputs and outputs when the algorithm is set to find exactly 3 inputs that balance with 2 outputs:
I.e. both inputs and outputs sum to approximately 3.56170. We will call this a candidate match.
Recall that this matched subset is among a larger pool of many more inputs and outputs, so while we have demonstrated that it is possible that these 3 inputs were payment relating to those 2 outputs, we haven’t yet proven that these 3 inputs were linked to those 2 outputs (hence why this is referred to as a candidate match).
So how can we go about proving a link between these inputs and outputs? Well, we know that there must exist some list of matches such that all of the inputs to the CashFusion transaction and outputs are used exactly once (as the real payers and payees for each individual transaction) and the inputs and outputs for each match balance exactly (to within a tolerance).
So, given that we have a candidate match above, if this match isn’t the true match, then for each value that is in this candidate match, there must exist some other match that is not a superset of this match that makes use of the value. For example, if the 62nd input to the transaction with value of 1.15659169 isn’t used in this match, it must be used in another match, and if we can prove that it can’t exist in any other match, then this match must be the true payment made between these addresses.
To check this, we can re-run the integer programming solver to try to find a match that is distinct from the match we have already found, that must include this value (this is implemented as a constraint in the integer program). Doing this we find another feasible match using this exact transaction input:
See above that the 62nd input could also exist in this different match found above. After repeating this process for all the inputs & outputs in the original candidate match above, we find that there is no way for us to know if the candidate match we found is the true transaction i.e. it’s not possible to establish a concrete link between these inputs and outputs since it’s more likely that these inputs & outputs are linked by pure coincidence than it is that they are verifiably linked by their payment amounts.
But could we find other matched subsets that can be verified as the true payments? Fortunately for those who are excited by CashFusion, no! I have repeated this process across thousands of matched subsets, none can be mathematically proven to be the true transactions.
So to sum up (excuse the pun), we have somewhat verified the CashFusion’s developers’ claims that even if someone could link inputs and outputs such that they balance, due to the high number of inputs and outputs it’s impossible to determine the true way that inputs and outputs truly relate (since there are multiple possible combinations of ways of getting the inputs and outputs to balance).
This is encouraging for CashFusion users since CashFusion is far more practical than other CoinJoin protocols that require users to send equal transaction amounts.
To donate to this research please use the addresses below:
Bitcoin Cash: qp7ep200zyejqfpxtglvndjnsjc7ex2say2tzqyrd6