Bitcoin Privacy Landscape In 2017 — Beyond Coin Mixing, General Guidelines And Research
My work on TumbleBit highlighted the need for a research into the question “What is the lightest weight wallet implementation that does not compromises the User’s privacy?”
Even if TumbleBit, JoinMarket, or a Bitcoin mixer would perfectly “break the chain” of transactions on the Bitcoin Blockchain there is no guarantee the User does not get compromised in other ways, for example examining the P2P network.
First I will armour you with some rule of thumbs on how to protect your privacy on the Blockchain, so the busy and the lazy can stop here.
If you cannot keep yourself to these principles or don’t know how to or just generally interested in the topic, you should read further.
Next I introduce you to some theory on the three dimensions of privacy in Bitcoin that one should tackle.
Then elaborate on my rule of thumbs.
Finally I go into researching what is the ideal architechture of the lightest weight privacy preserving wallet.
5 Rules of Privacy on the Blockchain
1. Be over Tor at all times.2. Use a full node.3. Operate at all times, in the belief that all your addresses belonging to the same wallet are already linked together by a third party.4. Never share networks between your wallets. In other words never use the same IP address with multiple wallets of yours.5. Between wallets only move coins by utilizing a mixing technique.
3 Dimensions of Privacy to Tackle
1. Clustering Strategies
Third parties, those are analyzing the blockchain, trying to find connections, which addresses belongs to the same wallet. I illustrate how it happens with a simple example.
Imagine you are the blockchain observer and you see 2 transaction on the blockchain:
addr1 spends all its coins to addr2 and addr3. At this point we have no idea if addr2 or addr3 is in the same wallet with addr1. Or do we? Our first clustering strategy we apply is that we suspect either addr2 or addr3 is the change that comes back the user’s wallet. We have no idea which one is though. In order to figure out we have to look at the next transactions.
Still seems hopeless to figure which is the change. However we can look at the fee patterns, as our next clustering strategy. And we might find some transactions are using the same fee-calculation algo, so we can guess those transactions might be generated by the same wallet, therefore we might identified two addresses belongs to the same walet:
In this case the red transactions are using the same fee pattern, therefore addr1 and addr3 are likely belong to the same wallet.
We can apply our next clustering strategy when when an address doesn’t hold enough money to pay for the transaction the user wants. In this case its wallet will join together an output of another transaction, therefore we can identify it is belonging to the same wallet, too.
In fact we might can apply an other clustering strategy to already identify which address is the change of that transaction if the stars align well. Let’s say the joined inputs of the last TX holds 0.5 and 0.7btc and and the unspent outputs 1 and 0.2btc. Then we can be almost sure the address with 0.2btc is the change, so it belongs to the same wallet.
It gets worse, since we might have already identified a transaction chain and belonging addresses, so now we just have to connect the pieces:
This is clustering. What you can do against this? Use services those are holding your private key, instead of you. Such a discrepancy. Those services can not only breach your privacy, but also steal from you. What can you do if you are using a real Bitcoin wallet? Not much at this point. I am not aware of any wallet that even remotely successful of tackling this. You could make every transaction of yours with JoinMarket. That is a pain in the ass though.
How well it actually works in practice? I have to admit I don’t exactly know. But according to Nick Jonas it works almost perfectly if combined with Network analysis, what is the second dimension of privacy to tackle:
2. Network Analysis
Imagine all your balances are queried from and transaction is broadcasted by Blockchain.info or CoinBase. You also connect them over Tor and fake your KYC. In this case you perfectly tackled the Network analysis issue, so no third party can connect your addresses together, except your service provider. Such a discrepancy again, right? Service provider reliance could be think of another dimension of privacy, but let’s not complicate this more.
How about full-nodes, SPVs and the exotic architechtures, like Electrum? Hold on, the rest of this article will be about this, but before let’s expose the third dimension of privacy:
3. Bitcoin Mixing
You might have noticed these dimensions are interconnected. Mixing even can be thinked of countermeasure against clustering, but let’s assume you don’t want to mix every transaction you make and mixing is only something that you occassionally do.
When? Because of clustering and network analysis, it is a good idea to never send any direct transaction from one wallet of yours to another one. You would rather mix them to the other wallet with a mixer, with JoinMarket or soon TumbleBit.
I am not going to talk about mixing techniques here, rather I will assume from now on mixing is a perfect way to obfuscate coins. Nothing can be further from the truth, but it helps me concentrate on how you can send and receive coins to and from the mix safely, in case you are not able buy multiple computers and run a Bitcoin Core over Tor at each one. Which I suspect is the case for everyone.
General guidelines — connecting the dots
If you takeaway anything from the above theory it should be: blockchain analysis combined with network analysis is the killer app for blockchain surveillance companies. So what can we do against it?
Against network analysis we can choose the right wallet and running it behind Tor. Wich is a harder job, than one would expect.
Against blockchain analysis we do not have much ammunition. We could mix every transaction we make, but let’s be real, we won’t. So far we all were thinking of Bitcoin as pseudonymous in the level of addresses. It might be a better way to think of pseudonymity in the level of wallets, since third parties are not examining our transactions, but rather our transaction chains.
1–2. Full node in pruned mode, over Tor is the lighest weight privacy preserving wallet that exists today
- Today, it takes up to 2 weeks to sync, before you could use it.
- If you don’t turn it on for like a week, it will take a few hours, before you are able to use it, because of syncing.
- In pruned mode, instead of 100+GB of storage, you can effectively lower it down to only a few GB.
- If bandwidth or laptop requirements is your concern, you can edit your bitcoin.conf file that will make your node effectively useless for the network, and your setup overall less secure, but at least you are able to run a full node.
Here are a few settings to consider in order to make Core lighter weight. This is not a complete list of hacks. Many of these are advised against, should only be used as last resort:
# Accept connections from outside (default: 1 if no -proxy or -connect)
# How many blocks to check at startup (default: 288, 0 = all)
# How thorough the block verification of -checkblocks is (0–4, default: 3)
# Maintain a full transaction index, used by the getrawtransaction rpc call (default: 0)
3. Operate at all times, thinking that all your addresses belonging to the same wallet are already linked together by a third party
Do not misread this by thinking, it’s ok to use only one address per wallet, that is not the case. What the case actually is with clustering strategies, by looking at how you are using your coins on the blockchain a third party might find many of your addresses with high likelihood from knowing only one address of yours, you used. This likelihood further grows if you are not using a full node. With Core you are pulling the whole blockchain to your disk and read it from there.
Every not fullnode does retrieve information from someone. This enables network analysis. If you ask a service what is the balance of 3AdorkDobnYKNfAfgNgLBa7aafJMfqSXzp, then that service or a man in the middle is going to suspect that balance is connected to you somehow.
With SPV, those are using BIP37 bloom filters, meaning: most SPV wallets, EVERY blockchain surveillance company already knows all your addresses, as Nick Jonas and this paper found.
Other wallets, using HTTP API, like JoinMarket with Blockr.io’s HTTP API mode or my HiddenWallet, with QBitNinja HTTP API are different. The central server, in JM’s case Blockr.io, in HW’s case QBitNinja knows all your addresses. However with BIP37 SPV’s you are basically sharing all your addresses to any surveillance companies.
In case of Electrum, what uses the Stratum protocol, querying central servers, run by random people you share all your addresses to all the servers you are connecting to.
Nick Jonas’ opinion is using an HTTP API (or other central server solutions) might be sligthly more private, since you are only sharing your addresses to with one entity, but with SPV for example you are sharing with entities. And indeed, this is the reason why JoinMarket does not implement an SPV wallet.
4. Never share networks between wallets
For one you want to always use Tor on any wallets that you are not willing to be associated to you. However you also should take care of not using the same Tor circuit for two wallets.
If you have two Electrum wallet file, you should at least restart Tor every time you change your wallet file. Same applies to other wallets, other than full nodes.
In case of Bitcoin Core, since retrieving blockchain information happens on disk, you would think you don’t have to worry about network analysis at all, but when you broadcast a transaction you do have to after all.
A possible, not fine tuned solution is to use two wallet.dat files. One where you mix from and one where you mix to. Keep your node behind Tor and every time before you broadcast from different wallet.dat restart Tor, so you get a new IP.
But we can do better. This can be handled from code by the way, and you could change Tor circuit every time you broadcast a transaction, so that further mitigates the risk of connecting your addresses together.
I wouldn’t be surprised if Core already has a solution for this, I could not find it, though.
5. Good and bad mixing examples
Keeping an eye on my above points and assuming mixing works near perfectly consider going through my examples as an exercise:
Good, if the two wallets do not ever broadcast any transaction with the same IP. It also applies to any other wallet that is using a full node. For example JoinMarket in full node mode or Armory. Simply changing wallet files works, too.
Wrong. Clustering can estabilish the connection.
Maybe ok, if the two wallets are never sharing the same IP. For example one of the wallet is using Tor at all times. It applies to the other way around: from Mycelium to Core. You can also apply this to other SPV or hybrid wallets, like Electrum, where network analysis near perfect.
Probably wrong, even if the two wallets are never sharing the same IP and they are over Tor. It applies to Electrum to Electrum, MultiBit to MultiBit, Electrum to Mycelium and so on... Assume all your wallet addresses are already identified by a third party. Therefore they can see from one wallet of yours 1.8 bitcoin disappears, goes to a mixing service, and it appears some times later in an other wallet of yours.
Wrong. From a KYC exchange you should never mix. Send out your coins to a throwaway wallet. Make a few transactions, so the exchange stops tracking your money and send it to the mixer, then Bitcoin Core, which should be over Tor.
Probably wrong, Even if you don’t share your network, so connecting to Blockchain.info through different Tor circuits, you cannot foor Blockchain info easily. While Blockchain.info does not hold your private keys, it does exactly knows what you send where. Therefore it can re-estabilish with a little work. You are completely dependent on the service in this case.
Maybe ok, if the clustering is not perfectly works on your full-node by a third party, it might cannot re-estabilish the link, even if it obtains all your transaction from Blockchain.info.
The lightest weight wallet possible that does not compromises your privacy against network analysis
It seems like the million dollar question is how to make the lightest wallet that doesn’t compromises your privacy.
Right now it is pruned Bitcoin Core with some hacks. But can we do better?
Jonas Schnelli can and it will most likely be integrated into Bitcoin Core: https://github.com/bitcoin/bitcoin/pull/9483
This is the complete patch-set for the hybrid full block SPV mode.
If one enables the SPV mode with -spv=1 it does...…first sync all headers (no block downloads during that phase)
…requests and persist all blocks that are relevant for the wallet (down to the dept of the older wallet key)
…scan the block for relevant transactions and flag them with validated = false (visible in listtransactions etc).
… continue with IBD (initial block download) after all wallet relevant blocks have been processedPure full block SPV mode is possible by setting -autorequestblocks=0, in that mode, no blocks for validating the chain will be downloaded, resulting in a SPV only mode.
Before you get too excited, it is not an SPV wallet, that you can put to your smartphone. Emphasis on the full-block SPV. In its lightest mode, it sync headers first, then downloads blocks, like a full node from the creation of the wallet. The question arises: what do you win, compared to the pruned SPV?
- Initial sync time. It will happen probably within an hour, instead of weeks.
- The blockchain will grow, just like a full-node without pruning: 50GB per year, but (a) you can delete it and start a new wallet when it gets too big. (b) I believe pruning compatibility will be implemented as well.
The problems? Bandwidth , CPU, storage, memory, long time to sync, if you fail to turn it on, and so on… Still no way to even remotely consider it to run on any mobile device.
You can find more information about this between the weekly core IRC meetup logs.
This is the end of our journey on what we can do today in finding the lightest way network analysis resistent Bitcoin wallet. The rest of it will be about building an even ligher one.
Bloom filter or not to bloom filter?
BIP37 or not to BIP37?
This BIP adds new support to the peer-to-peer protocol that allows peers to reduce the amount of transaction data they are sent.
Up to that point SPV wallets were requesting concrete information from other nodes, which meant sharing all their addresses with all the nodes they are connecting to in the network. For this, bloom filtering was introduced. The idea was to let the wallet configure the bloom filters by balancing available resources and privacy. The implementation of these wallets, without exception were inclined to favor resources, than privacy. As later it turned out it doesn’t even matter that much, privacy is pretty much screwed, but I’m getting ahead of myself.
understanding bloom filers and SPV wallets.
Mastering Bitcoin - The Bitcoin Network
Bitcoin Developer Guide - SPV, Bloom Filters
Balance lookup filters
Balance lookup filters are a Private Information Retrieval (PIR) method employed by non-full node wallet clients to obscure which Bitcoin address’s balance information the client is querying a neighbor for. To date, most discussion in this area has revolved around Address Bloom filters.
For general overview on the topic the Open Bitcoin Privacy Project provides a great introduction.
They categorize BIP37 as a technique that utilizes address bloom filters. An other method is using prefix filters, what is a simpler technique, but suffers from similar problems. They also introduce block bloom filters, which could be an interesting research topic for the future. To my knowledge it was not fully explored, nor implemented. A great technical conversation on the topic can be found on the Bitcoin Developer Mailing List.
To bloom filter
Before I go into some details I would like to mention a proposal on Committed bloom filters for improved wallet performance and SPV security in the dev mailing list, which seems to incorporate the research I will present to you in a moment and provides a solution, but I cannot make an opinion, since I did not read it, but need to mention it for completeness.
The reason bitcoinj doesn’t use the obfuscation capabilities of the Bloom filtering protocol is that lying consistently is hard. I mentioned this to Jonas a few days ago at the Bitcoin meetup I attended. Let’s elaborate on what this means.The Bloom filtering protocol let’s bitcoinj lie about what it’s interested in from a remote node. But anyone who ever watched a cop show knows that lying is one thing, but lying without getting caught is something else entirely. Usually in these shows, the detective cleverly puzzles out whodunnit from inconsistencies and mistakes in the suspect’s story.Common problems that let the detective catch the bad guy include: constantly changing their story, telling different lies to different observers, telling lies that contain elements of the truth and so on.
The concrete problems with BIP37
Two research found BIP37 from a privacy point of view very difficult, maybe impossible.
BIP 37 states:"Privacy: Because Bloom filters are probabilistic, with the false positive rate chosen by the client, nodes can trade off precision vs bandwidth usage. A node with access to lots of bandwidth may choose to have a high fp rate, meaning the remote peer cannot accurately know which transactions belong to the client and which don’t."This has created a misunderstanding between what is ideally possible with Bloom filters and how the reality looks like. I’ll focus on BitcoinJ because it is the most widely used implementation of BIP 37, but similar vulnerabilities might exist in other implementations as well. Unfortunately, in the current BitcoinJ implementation Bloom filters are just as bad for your privacy as broadcasting your pubkeys directly to your peers.
- Arthur Gervais, Ghassan O. Karame, Damian Gruber, Srdjan Capkun — On the Privacy Provisions of Bloom Filters in Lightweight Bitcoin Clients
- Nick Jonas: Video presentation, blog post, research paper
- Mike Hearn’s reply on mailing list and on Reddit
To sum up, the concrete problems the researches found were:
(a) Pubkey and pubkey hash both should not be put to the bloom filter.
(b) Bloom filter should never be changed.
Based on the suggestions of the first research paper Nicolas Dorier implemented a proof of concept SPV wallet, which solves both (a) and (b) problems, plus a little more.
The implementation of TrackerBehavior is privacy friendly. All the wallets are sharing the same bloom filter, the bloom filter is preloaded with 1000 keys per wallet and never updated. Every 10 minutes, it disconnects from peers and reconnect to new ones with the same filter. I followed this paper, and improved on it.This paper was oblivous to the fact that filters need to be reloaded periodically since at every false positive, the filter matches more objects. But if the filter is renewed on the same peer, then by doing a differential of the two filters, a malicious peer can find out which coins belongs to you.If the bloom filter need to be reloaded (for generating a new batch of 1000 keys), then the connections to the current peers are purged, and new nodes are found.
However there were another problem, which Mike Hearn brought up in his answer: subgraph traversal, which were not solved in Nicolas’ implementation. This basically means “false positive addresses the spy finds can be eliminated by seeing that they dont make any transactions to any other addresses from the filter”, as belcher, JoinMarket’s creator put it.
Finding the lightest possible network analysis resistant wallet is critical for the future of Bitcoin from a privacy point of view. There is a chance it is not actually possible to do better, than a full-SPV wallet.
In my opinion smarter people, than me could direct their attention to the following topics, in order to progress in this issue:
- Nicolas’ solution could be tested. There is a slight chance subgraph traversal is too theoretical to work.
- The subgraph traversal problem could be further researched and maybe find a solution.
- Different balance lookup filtering methods could be researched.
- Block bloom filters could be further researched, implemented and tested