Introducing Boltzmann

“Nothing is more practical than a good theory.” — Ludwig Boltzmann

Today, I’m thrilled to share Boltzmann, a tool which (I hope) will be useful to all developers working on Bitcoin wallets and services. It should help them to better quantify the degree of privacy associated to transactions built by their service (especially for those implementing privacy-friendly features like CoinJoin or BIP126).

What is Boltzmann ?

Boltzmann is a python script computing a set of metrics for a Bitcoin transaction :
* entropy of the transaction : a metrics measuring how many possible mappings of inputs to outputs are possible given the values,
* link probability between an input and an output : the probability that an input has sent some funds to an output,
* linkability matrix of the transaction : a matrix storing the link probabilities between inputs and outputs of the transaction

The idea of building a tool like Boltzmann comes from a discussion with Greg Maxwell on the famous thread “CoinJoin: Bitcoin privacy for the real world” and from prior works done by Kristov Atlas with CoinJoin Sudoku. Kudos to both of them !

Boltzmann was implemented on early 2015 and has since been a component of the OXT platform.

Probabilities of a link between a selected input and the outputs of a transaction as displayed in OXT

As part of the works done for the Open Bitcoin Privacy Project, I’ve also used Boltzmann in 2015 to get an estimate of the number of CoinJoin-like transactions (spoiler : 2–3% of all transactions). This figure was later confirmed by the Bitfury Group.

Why do we need this entropy and linkability stuff ? We already have taint analysis.

Taint analysis usually does a “pretty good job” in the financial industry. The main reason is that it works fine as long as you deal with deterministic flows (e.g. a transaction can be interpreted with 100% certainty as entity A sending money to entity B).

While this condition was almost true for Bitcoin during its early years (at least in practice), the situation has started to change with the introduction of privacy-enhancing features like CoinJoin.

CoinJoin (and its successors) allows several entities to merge their transactions into a single one and it adds a “little trick” to the recipe : several outputs have the same amount.

A CoinJoin transaction with 2 payers. Who sent what to whom ?

This simple “little trick” breaks the assumption that a deterministic link always exists between the inputs and the outputs of a transaction and it becomes impossible to map the outputs to the inputs with 100% certainty.

The true beauty of CoinJoin is that it reveals a disruptive aspect of Bitcoin :

Intrinsically, the link between the sender and the receiver of a Bitcoin payment isn’t deterministic but probabilistic. This is engraved in the inner mechanisms of the protocol. You can’t have Bitcoin and its UTXOs without this probabilistic nature.

This is the reason why taint analysis applied to Bitcoin will become more and more obsolete as we see a growing adoption of privacy-enhancing features.

How does it work ?

Let’s check that with this CoinJoin transaction

Yep. It’s still the same CoinJoin transaction

Boltzmann returns the following results

Did I mention Boltzmann is a command line tool ?

Measured entropy (1.585 bits) is typical of basic CoinJoin transactions with 2 participants and it corresponds to 3 possible interpretations :

  • This is a real CoinJoin transaction with 2 participants. Input 1 is linked to outputs 1 and 2, …
[(Input1) => (Output1, Output2), (Input2) => (Output3, Output4)]
  • This is a real CoinJoin transaction with 2 participants. Input 1 is linked to outputs 2 and 3, etc
[(Input1) => (Output3, Output2), (Input2) => (Output1, Output4)]
  • This is a “fake” CoinJoin transaction sent by a single (humorous) entity… All inputs are linked to all outputs.
[(Input1, Input2) => (Output1, Output2, Output3, Output4)]

Now, if you observe these 3 interpretations you’ll notice that Input1 is always associated to Output2 (same phenomenon with Input2 and Output4). This point is also detected by Boltzmann which returns a link probability equal to 1 for these inputs & outputs (i.e. a deterministic link).

It means that all links are not obfuscated by this CoinJoin transaction. Actually it’s a characteristic shared by many implementations of CoinJoin : the change output remains deterministically linked to the inputs.

How to use it ?

Easy. Go to the github repo. Everything is explained in the readme file.

For an additional background about the metrics, check these gists :
* Bitcoin Transactions & Privacy (part 1)
* Bitcoin Transactions & Privacy (part 2)
* Bitcoin Transactions & Privacy (part 3)

What’s next ?

If you’re the developer of a bitcoin wallet, I encourage you to implement BIP47, BIP69, BIP126 and others features improving the privacy of your users. And don’t forget to check the results with Boltzmann or OXT.

If you’re a developer or a CS student interested by these metrics, I encourage you to build a better version of this algorithm. I’m used to calling Boltzmann “the worst implementation in the world” because it implements a very stupid brute force method which limits its capacities and I know that there’s room for improvements (optimization, memoization of intermediate results, parallelization, rewriting in a faster language, generalized form of the metrics, etc).

If you’re a bitcoin user, I encourage you to lobby the developers of your favorite wallet for the implementation of privacy-enhancing features.

At last, don’t forget to join the Open Bitcoin Privacy Project.

My 2 satoshis.
lauremtmt