Knapsack Mixing

nopara73
nopara73
Jan 7 · 8 min read

I’d like to briefly introduce a 2017 paper, the idea of Knapsack mixing, as I believe this concept was kinda lost and was not properly explored.

Normal Transactions

I am Alice and a long time ago Satoshi gave me 1 BTC, so I have a coin with the value of 1 BTC.
Yesterday I bought some alpaca socks from Bob for 0.1 BTC. So I created a Bitcoin transaction that looks like this:

Where did the 0.9BTC go? Turns out I’m stupid and I accidentally paid it as a miner fee, because the difference between the 1BTC input coin and the 0.1BTC output coin is the miner fee. But you aren’t as stupid as me, you would add the 0.8999BTC as an output coin to the transaction. An output that you control, so you can use it later. And if I look at the Bitcoin blockchain, I quickly realize that nobody is as stupid as me, thus, this is how a normal Bitcoin transaction looks like:

However I want to simplify this model and not care about the miner fee. That will make it easier to discuss Knapsack mixing later:

Naive CoinJoin Transactions

It turns out we can merge transactions together. Take this transaction between Dr. Ken Hurt and Mike Oxmaul:

And coinjoin them together with Alice and Bob’s transaction:

The problem is, if we solve the subset sum problem on this coinjoin then we can find the all the possible sub-transactions:

Sub mappings:
5,1 -> 2,3,0.1,0.9
5 -> 2,3 | 1 -> 0.1,0.9

Without any further heuristics and assumptions, the computational complexity makes it hard to deanonymize large naive coinjoins, since the complexity of the best known algorithms for solving the subset sum problem is exponential. The paper even provided a lower bound estimation:

GCD Mixing

We could mix with the greatest common divisor of the outputs, which is 0.1BTC. In this case the outputs would look like this:

0.1 (Bob), 0.1(Alice), 0.1(Alice), 0.1(Alice), 0.1(Alice), 0.1(Alice), 0.1(Alice), 0.1(Alice), 0.1(Alice), 0.1(Alice), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike)

While it would be interesting to see the analysis on this transaction, unfortunately my software cannot produce it due to the computational complexity involved in solving the subset sum problem on this transaction. However we know that it will produce a lot of valid subset sums. Does that mean GCD Mixing is so much better than naive coinjoins?

No, because what really matters is not the number of subset sum, but how many input-input, input-output and output-output links were broken. The paper provides a model to calculate these values. For us, let’s move on to a more easily analyzable coinjoin instead.

Ad-hoc Mixing

What happens if Mike takes his 3BTC output and makes it 0.9BTC and 2.1BTC instead?

Sub mappings:
5,1 -> 2,0.1,0.9,0.9,2.1
5 -> 2,0.9,2.1 | 1 -> 0.1,0.9
5 -> 2,0.9,2.1 | 1 -> 0.1,0.9

We got 3 sub mappings. That’s getting better.
Notice that while it seems like 5 -> 2,0.9,2.1 | 1 -> 0.1,0.9 line is duplicated, it is not the case, because there are 2 distinct 0.9BTC coins, it’s just I don’t want to uglify my software output with introducing coin identifiers.

Knapsack Mixing

The idea of Knapsack mixing is taking the difference between the inputs: 5-1=4 and realizing subsum on the larger output set, which is the 2btc and 3btc outputs.

Sub mappings:
5,1 -> 2,2,1,0.1,0.9
5 -> 2,2,0.1,0.9 | 1 -> 1
5 -> 2,2,1 | 1 -> 0.1,0.9

Knapsack mixing created 1 more sub-mapping in expense of one extra output, just like our ad-hoc mixing. However, as we figured during GCD mixing, the number of sub mapping is not the ultimate score for a coinjoin, rather input-input, input-output and output-output links are what matter.

NAIVE COINJOINSub mappings:
5,1 -> 2,3,0.1,0.9
5 -> 2,3 | 1 -> 0.1,0.9
Input match probabilities:
5 - inputs: 1(0.5) | outputs: 2(1) 3(1) 0.1(0.5) 0.9(0.5)
1 - inputs: 5(0.5) | outputs: 2(0.5) 3(0.5) 0.1(1) 0.9(1)
Output match probabilities:
2 - inputs: 3(1) 0.1(0.5) 0.9(0.5) | outputs: 5(1) 1(0.5)
3 - inputs: 2(1) 0.1(0.5) 0.9(0.5) | outputs: 5(1) 1(0.5)
0.1 - inputs: 2(0.5) 3(0.5) 0.9(1) | outputs: 5(0.5) 1(1)
0.9 - inputs: 2(0.5) 3(0.5) 0.1(1) | outputs: 5(0.5) 1(1)

In the naive coinjoin, the 5BTC input has a 1(=100%) probability of matching the outputs 2BTC and 3BTC , because both sub-mappings map the 5BTC input to the 2BTC and 3BTC outputs.
It has 0.5(=50%) probability to match the 0.1BTC output, because 1 out of 2 sub mappings maps the 5BTC input to the 0.1BTC output.

While the Knapsack paper mostly stops in its evaluation here, I want to introduce a CoinJoin Ambiguity metric: CJA instead to be able to compare the mixes with each other. CJA for this transaction would be (5BTC*1BTC)/0.5 + (5BTC*2BTC)/1 + (5BTC*3BTC)/1 + ... = 102

Let’s compare this to our ad-hoc and knapsack algorithms:

CJA(naive)=102
CJA(ad-hoc)=116
CJA(knapsack)=115

To my surprise I got a better CoinJoin Ambiguity score for the ad-hoc mixing than what I got for Knapsack. Let’s not let this discourage us from progressing forward though. For completeness here’re the full analysis’s of the 3 discussed mixing:

NAIVESub mappings:
5,1 -> 2,3,0.1,0.9
5 -> 2,3 | 1 -> 0.1,0.9
Input match probabilities:
5 - inputs: 1(0.5) | outputs: 2(1) 3(1) 0.1(0.5) 0.9(0.5)
1 - inputs: 5(0.5) | outputs: 2(0.5) 3(0.5) 0.1(1) 0.9(1)
Output match probabilities:
2 - inputs: 3(1) 0.1(0.5) 0.9(0.5) | outputs: 5(1) 1(0.5)
3 - inputs: 2(1) 0.1(0.5) 0.9(0.5) | outputs: 5(1) 1(0.5)
0.1 - inputs: 2(0.5) 3(0.5) 0.9(1) | outputs: 5(0.5) 1(1)
0.9 - inputs: 2(0.5) 3(0.5) 0.1(1) | outputs: 5(0.5) 1(1)
CoinJoin Ambiguity Score: 102AD-HOCSub mappings:
5,1 -> 2,0.1,0.9,0.9,2.1
5 -> 2,0.9,2.1 | 1 -> 0.1,0.9
5 -> 2,0.9,2.1 | 1 -> 0.1,0.9
Input match probabilities:
5 - inputs: 1(0.33) | outputs: 2(1) 0.1(0.33) 0.9(0.67) 0.9(0.67) 2.1(1)
1 - inputs: 5(0.33) | outputs: 2(0.33) 0.1(1) 0.9(0.67) 0.9(0.67) 2.1(0.33)
Output match probabilities:
2 - inputs: 0.1(0.33) 0.9(0.67) 0.9(0.67) 2.1(1) | outputs: 5(1) 1(0.33)
0.1 - inputs: 2(0.33) 0.9(0.67) 0.9(0.67) 2.1(0.33) | outputs: 5(0.33) 1(1)
0.9 - inputs: 2(0.67) 0.1(0.67) 0.9(0.33) 2.1(0.67) | outputs: 5(0.67) 1(0.67)
0.9 - inputs: 2(0.67) 0.1(0.67) 0.9(0.33) 2.1(0.67) | outputs: 5(0.67) 1(0.67)
2.1 - inputs: 2(1) 0.1(0.33) 0.9(0.67) 0.9(0.67) | outputs: 5(1) 1(0.33)
CoinJoin Ambiguity Score: 116KNAPSACKSub mappings:
5,1 -> 2,2,1,0.1,0.9
5 -> 2,2,0.1,0.9 | 1 -> 1
5 -> 2,2,1 | 1 -> 0.1,0.9
Input match probabilities:
5 - inputs: 1(0.33) | outputs: 2(1) 2(1) 1(0.67) 0.1(0.67) 0.9(0.67)
1 - inputs: 5(0.33) | outputs: 2(0.33) 2(0.33) 1(0.67) 0.1(0.67) 0.9(0.67)
Output match probabilities:
2 - inputs: 2(1) 1(0.67) 0.1(0.67) 0.9(0.67) | outputs: 5(1) 1(0.33)
2 - inputs: 2(1) 1(0.67) 0.1(0.67) 0.9(0.67) | outputs: 5(1) 1(0.33)
1 - inputs: 2(0.67) 2(0.67) 0.1(0.33) 0.9(0.33) | outputs: 5(0.67) 1(0.67)
0.1 - inputs: 2(0.67) 2(0.67) 1(0.33) 0.9(1) | outputs: 5(0.67) 1(0.67)
0.9 - inputs: 2(0.67) 2(0.67) 1(0.33) 0.1(1) | outputs: 5(0.67) 1(0.67)
CoinJoin Ambiguity Score: 115

Notice that the CoinJoin Ambiguity score does not factor in the blockspace used. In order to really compare coinjoins with each other we need to factor in that. However between the comparision of my ad-hoc mixing and the knapsack mixing we don’t care, since they created the same number of outputs.

Notice that sometimes we want to split active outputs, which in practice is difficult. I can generate multiple change addresses for myself, but I cannot generate multiple addresses for the receiver. For someone who I want to send to. There are some techniques out there, like Stealth Addresses and Payment Codes, but they did not gain adoption, because of their tradeoffs.

Notice that, in every analysis there’s a mapping that does not have sub-transactions. This is the transaction that we see on the blockchain. This is the coinjoin. In order to create more accurate comparision between the naive coinjoin and other mixes this mapping should be ruled out as invalid and should not weigh into the input-input, input-output, output-output link probability matrixes, and of course neither to derived metrics, like the CoinJoin Ambiguity score.

One issue with constructing coinjoins is the question how to coordinate them in a trustless manner. CoinShuffle/Chaumian CoinJoin/TumbleBit/Xim/etc… work because there are equal outputs.
I don’t think this is an impossible job though, but we cannot go ahead an do this research just yet, because first we have to figure out what is the most blockspace efficient algorithm for mixing, and then implement a trustless scheme based on that.

The Knapsack paper provided another improved Knapsack mixing technique, which would give a 126 CJA score, which is the best so far:

Sub mappings:
5,1 -> 1,1,3,0.1,0.9
5 -> 1,3,0.1,0.9 | 1 -> 1
5 -> 1,1,3 | 1 -> 0.1,0.9
5 -> 1,3,0.1,0.9 | 1 -> 1
Input match probabilities:
5 - inputs: 1(0.25) | outputs: 1(0.75) 1(0.75) 3(1) 0.1(0.75) 0.9(0.75)
1 - inputs: 5(0.25) | outputs: 1(0.5) 1(0.5) 3(0.25) 0.1(0.5) 0.9(0.5)
Output match probabilities:
1 - inputs: 1(0.5) 3(0.75) 0.1(0.5) 0.9(0.5) | outputs: 5(0.75) 1(0.5)
1 - inputs: 1(0.5) 3(0.75) 0.1(0.5) 0.9(0.5) | outputs: 5(0.75) 1(0.5)
3 - inputs: 1(0.75) 1(0.75) 0.1(0.75) 0.9(0.75) | outputs: 5(1) 1(0.25)
0.1 - inputs: 1(0.5) 1(0.5) 3(0.75) 0.9(1) | outputs: 5(0.75) 1(0.5)
0.9 - inputs: 1(0.5) 1(0.5) 3(0.75) 0.1(1) | outputs: 5(0.75) 1(0.5)
CoinJoin Ambiguity Score: 126

Felix, one of the author of the paper shared another improvement on the mixing technique with us, which I have yet to look into. For now I just link the repository here: https://gitlab.com/maufl/cja/

CoinJoin Analysis Tool

The software I used to produce the above analyses is a small CoinJoin analysis tool I wrote:

Update

The CoinJoin Ambiguity score was incorrectly calculated, because when we are examining the distance between two coins, we want the sum of the two coins and not the multiplication of them, because the relevant metric here is the total amount of coins being disassociated from each other. With this in mind, this is how the scores change:

NAIVE
CoinJoin Ambiguity Score: 120
AD-HOC
CoinJoin Ambiguity Score: 162
KNAPSACK
CoinJoin Ambiguity Score: 148
KNAPSACK 2
CoinJoin Ambiguity Score: 161

As you can see the CJA now became the highest in the ad-hoc algorithm.

Coinmonks

Coinmonks is a non-profit Crypto educational publication. Follow us on Twitter @coinmonks Our other project — https://coincodecap.com

 by the author.

nopara73

Written by

nopara73

https://www.youtube.com/watch?v=QiySI4-MWww

Coinmonks

Coinmonks

Coinmonks is a non-profit Crypto educational publication. Follow us on Twitter @coinmonks Our other project — https://coincodecap.com

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade