Knapsack Mixing

nopara73
nopara73
Jan 7 · 8 min read

I’d like to briefly introduce a 2017 paper, the idea of Knapsack mixing, as I believe this concept was kinda lost and was not properly explored.

Normal Transactions

I am Alice and a long time ago Satoshi gave me 1 BTC, so I have a coin with the value of 1 BTC.
Yesterday I bought some alpaca socks from Bob for 0.1 BTC. So I created a Bitcoin transaction that looks like this:

Image for post
Image for post

Where did the 0.9BTC go? Turns out I’m stupid and I accidentally paid it as a miner fee, because the difference between the 1BTC input coin and the 0.1BTC output coin is the miner fee. But you aren’t as stupid as me, you would add the 0.8999BTC as an output coin to the transaction. An output that you control, so you can use it later. And if I look at the Bitcoin blockchain, I quickly realize that nobody is as stupid as me, thus, this is how a normal Bitcoin transaction looks like:

Image for post
Image for post

However I want to simplify this model and not care about the miner fee. That will make it easier to discuss Knapsack mixing later:

Naive CoinJoin Transactions

It turns out we can merge transactions together. Take this transaction between Dr. Ken Hurt and Mike Oxmaul:

Image for post
Image for post

And coinjoin them together with Alice and Bob’s transaction:

Image for post
Image for post

The problem is, if we solve the subset sum problem on this coinjoin then we can find the all the possible sub-transactions:

Sub mappings:
5,1 -> 2,3,0.1,0.9
5 -> 2,3 | 1 -> 0.1,0.9

Quick Tangent

Without any further heuristics and assumptions, the computational complexity makes it hard to deanonymize large naive coinjoins, since the complexity of the best known algorithms for solving the subset sum problem is exponential. The paper even provided a lower bound estimation:

Image for post
Image for post

GCD Mixing

We could mix with the greatest common divisor of the outputs, which is 0.1BTC. In this case the outputs would look like this:

0.1 (Bob), 0.1(Alice), 0.1(Alice), 0.1(Alice), 0.1(Alice), 0.1(Alice), 0.1(Alice), 0.1(Alice), 0.1(Alice), 0.1(Alice), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Dr.Ken), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike), 0.1(Mike)

While it would be interesting to see the analysis on this transaction, unfortunately my software cannot produce it due to the computational complexity involved in solving the subset sum problem on this transaction. However we know that it will produce a lot of valid subset sums. Does that mean GCD Mixing is so much better than naive coinjoins?

No, because what really matters is not the number of subset sum, but how many input-input, input-output and output-output links were broken. The paper provides a model to calculate these values. For us, let’s move on to a more easily analyzable coinjoin instead.

Ad-hoc Mixing

What happens if Mike takes his 3BTC output and makes it 0.9BTC and 2.1BTC instead?

Sub mappings:
5,1 -> 2,0.1,0.9,0.9,2.1
5 -> 2,0.9,2.1 | 1 -> 0.1,0.9
5 -> 2,0.9,2.1 | 1 -> 0.1,0.9

We got 3 sub mappings. That’s getting better.
Notice that while it seems like 5 -> 2,0.9,2.1 | 1 -> 0.1,0.9 line is duplicated, it is not the case, because there are 2 distinct 0.9BTC coins, it’s just I don’t want to uglify my software output with introducing coin identifiers.

Knapsack Mixing

The idea of Knapsack mixing is taking the difference between the inputs: 5-1=4 and realizing subsum on the larger output set, which is the 2btc and 3btc outputs.

Image for post
Image for post
Sub mappings:
5,1 -> 2,2,1,0.1,0.9
5 -> 2,2,0.1,0.9 | 1 -> 1
5 -> 2,2,1 | 1 -> 0.1,0.9

Knapsack mixing created 1 more sub-mapping in expense of one extra output, just like our ad-hoc mixing. However, as we figured during GCD mixing, the number of sub mapping is not the ultimate score for a coinjoin, rather input-input, input-output and output-output links are what matter.

input-input, input-output and output-output links

NAIVE COINJOINSub mappings:
5,1 -> 2,3,0.1,0.9
5 -> 2,3 | 1 -> 0.1,0.9
Input match probabilities:
5 - inputs: 1(0.5) | outputs: 2(1) 3(1) 0.1(0.5) 0.9(0.5)
1 - inputs: 5(0.5) | outputs: 2(0.5) 3(0.5) 0.1(1) 0.9(1)
Output match probabilities:
2 - inputs: 3(1) 0.1(0.5) 0.9(0.5) | outputs: 5(1) 1(0.5)
3 - inputs: 2(1) 0.1(0.5) 0.9(0.5) | outputs: 5(1) 1(0.5)
0.1 - inputs: 2(0.5) 3(0.5) 0.9(1) | outputs: 5(0.5) 1(1)
0.9 - inputs: 2(0.5) 3(0.5) 0.1(1) | outputs: 5(0.5) 1(1)

In the naive coinjoin, the 5BTC input has a 1(=100%) probability of matching the outputs 2BTC and 3BTC , because both sub-mappings map the 5BTC input to the 2BTC and 3BTC outputs.
It has 0.5(=50%) probability to match the 0.1BTC output, because 1 out of 2 sub mappings maps the 5BTC input to the 0.1BTC output.

While the Knapsack paper mostly stops in its evaluation here, I want to introduce a CoinJoin Ambiguity metric: CJA instead to be able to compare the mixes with each other. CJA for this transaction would be (5BTC*1BTC)/0.5 + (5BTC*2BTC)/1 + (5BTC*3BTC)/1 + ... = 102

Let’s compare this to our ad-hoc and knapsack algorithms:

CJA(naive)=102
CJA(ad-hoc)=116
CJA(knapsack)=115

To my surprise I got a better CoinJoin Ambiguity score for the ad-hoc mixing than what I got for Knapsack. Let’s not let this discourage us from progressing forward though. For completeness here’re the full analysis’s of the 3 discussed mixing:

NAIVESub mappings:
5,1 -> 2,3,0.1,0.9
5 -> 2,3 | 1 -> 0.1,0.9
Input match probabilities:
5 - inputs: 1(0.5) | outputs: 2(1) 3(1) 0.1(0.5) 0.9(0.5)
1 - inputs: 5(0.5) | outputs: 2(0.5) 3(0.5) 0.1(1) 0.9(1)
Output match probabilities:
2 - inputs: 3(1) 0.1(0.5) 0.9(0.5) | outputs: 5(1) 1(0.5)
3 - inputs: 2(1) 0.1(0.5) 0.9(0.5) | outputs: 5(1) 1(0.5)
0.1 - inputs: 2(0.5) 3(0.5) 0.9(1) | outputs: 5(0.5) 1(1)
0.9 - inputs: 2(0.5) 3(0.5) 0.1(1) | outputs: 5(0.5) 1(1)
CoinJoin Ambiguity Score: 102AD-HOCSub mappings:
5,1 -> 2,0.1,0.9,0.9,2.1
5 -> 2,0.9,2.1 | 1 -> 0.1,0.9
5 -> 2,0.9,2.1 | 1 -> 0.1,0.9
Input match probabilities:
5 - inputs: 1(0.33) | outputs: 2(1) 0.1(0.33) 0.9(0.67) 0.9(0.67) 2.1(1)
1 - inputs: 5(0.33) | outputs: 2(0.33) 0.1(1) 0.9(0.67) 0.9(0.67) 2.1(0.33)
Output match probabilities:
2 - inputs: 0.1(0.33) 0.9(0.67) 0.9(0.67) 2.1(1) | outputs: 5(1) 1(0.33)
0.1 - inputs: 2(0.33) 0.9(0.67) 0.9(0.67) 2.1(0.33) | outputs: 5(0.33) 1(1)
0.9 - inputs: 2(0.67) 0.1(0.67) 0.9(0.33) 2.1(0.67) | outputs: 5(0.67) 1(0.67)
0.9 - inputs: 2(0.67) 0.1(0.67) 0.9(0.33) 2.1(0.67) | outputs: 5(0.67) 1(0.67)
2.1 - inputs: 2(1) 0.1(0.33) 0.9(0.67) 0.9(0.67) | outputs: 5(1) 1(0.33)
CoinJoin Ambiguity Score: 116KNAPSACKSub mappings:
5,1 -> 2,2,1,0.1,0.9
5 -> 2,2,0.1,0.9 | 1 -> 1
5 -> 2,2,1 | 1 -> 0.1,0.9
Input match probabilities:
5 - inputs: 1(0.33) | outputs: 2(1) 2(1) 1(0.67) 0.1(0.67) 0.9(0.67)
1 - inputs: 5(0.33) | outputs: 2(0.33) 2(0.33) 1(0.67) 0.1(0.67) 0.9(0.67)
Output match probabilities:
2 - inputs: 2(1) 1(0.67) 0.1(0.67) 0.9(0.67) | outputs: 5(1) 1(0.33)
2 - inputs: 2(1) 1(0.67) 0.1(0.67) 0.9(0.67) | outputs: 5(1) 1(0.33)
1 - inputs: 2(0.67) 2(0.67) 0.1(0.33) 0.9(0.33) | outputs: 5(0.67) 1(0.67)
0.1 - inputs: 2(0.67) 2(0.67) 1(0.33) 0.9(1) | outputs: 5(0.67) 1(0.67)
0.9 - inputs: 2(0.67) 2(0.67) 1(0.33) 0.1(1) | outputs: 5(0.67) 1(0.67)
CoinJoin Ambiguity Score: 115

CoinJoin Efficiency

Notice that the CoinJoin Ambiguity score does not factor in the blockspace used. In order to really compare coinjoins with each other we need to factor in that. However between the comparision of my ad-hoc mixing and the knapsack mixing we don’t care, since they created the same number of outputs.

Splitting Active Outputs

Notice that sometimes we want to split active outputs, which in practice is difficult. I can generate multiple change addresses for myself, but I cannot generate multiple addresses for the receiver. For someone who I want to send to. There are some techniques out there, like Stealth Addresses and Payment Codes, but they did not gain adoption, because of their tradeoffs.

Non-Derived Mapping

Notice that, in every analysis there’s a mapping that does not have sub-transactions. This is the transaction that we see on the blockchain. This is the coinjoin. In order to create more accurate comparision between the naive coinjoin and other mixes this mapping should be ruled out as invalid and should not weigh into the input-input, input-output, output-output link probability matrixes, and of course neither to derived metrics, like the CoinJoin Ambiguity score.

Coordination

One issue with constructing coinjoins is the question how to coordinate them in a trustless manner. CoinShuffle/Chaumian CoinJoin/TumbleBit/Xim/etc… work because there are equal outputs.
I don’t think this is an impossible job though, but we cannot go ahead an do this research just yet, because first we have to figure out what is the most blockspace efficient algorithm for mixing, and then implement a trustless scheme based on that.

Other Mixing Techniques

The Knapsack paper provided another improved Knapsack mixing technique, which would give a 126 CJA score, which is the best so far:

Sub mappings:
5,1 -> 1,1,3,0.1,0.9
5 -> 1,3,0.1,0.9 | 1 -> 1
5 -> 1,1,3 | 1 -> 0.1,0.9
5 -> 1,3,0.1,0.9 | 1 -> 1
Input match probabilities:
5 - inputs: 1(0.25) | outputs: 1(0.75) 1(0.75) 3(1) 0.1(0.75) 0.9(0.75)
1 - inputs: 5(0.25) | outputs: 1(0.5) 1(0.5) 3(0.25) 0.1(0.5) 0.9(0.5)
Output match probabilities:
1 - inputs: 1(0.5) 3(0.75) 0.1(0.5) 0.9(0.5) | outputs: 5(0.75) 1(0.5)
1 - inputs: 1(0.5) 3(0.75) 0.1(0.5) 0.9(0.5) | outputs: 5(0.75) 1(0.5)
3 - inputs: 1(0.75) 1(0.75) 0.1(0.75) 0.9(0.75) | outputs: 5(1) 1(0.25)
0.1 - inputs: 1(0.5) 1(0.5) 3(0.75) 0.9(1) | outputs: 5(0.75) 1(0.5)
0.9 - inputs: 1(0.5) 1(0.5) 3(0.75) 0.1(1) | outputs: 5(0.75) 1(0.5)
CoinJoin Ambiguity Score: 126

Felix, one of the author of the paper shared another improvement on the mixing technique with us, which I have yet to look into. For now I just link the repository here: https://gitlab.com/maufl/cja/

CoinJoin Analysis Tool

The software I used to produce the above analyses is a small CoinJoin analysis tool I wrote:

Update

The CoinJoin Ambiguity score was incorrectly calculated, because when we are examining the distance between two coins, we want the sum of the two coins and not the multiplication of them, because the relevant metric here is the total amount of coins being disassociated from each other. With this in mind, this is how the scores change:

NAIVE
CoinJoin Ambiguity Score: 120
AD-HOC
CoinJoin Ambiguity Score: 162
KNAPSACK
CoinJoin Ambiguity Score: 148
KNAPSACK 2
CoinJoin Ambiguity Score: 161

As you can see the CJA now became the highest in the ad-hoc algorithm.

Get Best Software Deals Directly In Your Inbox

Image for post
Image for post
Image for post
Image for post

Coinmonks

Coinmonks is a non-profit Crypto educational publication.

Sign up for Coinmonks

By Coinmonks

A newsletter that brings you week's best crypto and blockchain stories and trending news directly in your inbox, by CoinCodeCap.com Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

 by the author.

nopara73

Written by

nopara73

https://www.youtube.com/watch?v=QiySI4-MWww

Coinmonks

Coinmonks

Coinmonks is a non-profit Crypto educational publication. Follow us on Twitter @coinmonks Our other project — https://coincodecap.com

nopara73

Written by

nopara73

https://www.youtube.com/watch?v=QiySI4-MWww

Coinmonks

Coinmonks

Coinmonks is a non-profit Crypto educational publication. Follow us on Twitter @coinmonks Our other project — https://coincodecap.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store