Attacker Collection of IP Metadata

IP metadata analysis isn’t a new problem. People have worried for years about leaking the IP that their transaction came from for years. Bitcoin Core offers a simple mechanism to connect through Tor. Several Bitcoin forks made a name for themselves by providing Tor privacy features.

Apparently there was a coin called TorCoin, which isn’t surprising. It’s not listed on any exchanges.

Tor is only one component of cryptocurrency privacy. It is generally far more important to hide information on the blockchain layer, since this information is accessible to everyone for the history of time. Meanwhile, information leaked in broadcasting your transaction is only revealed to a select few individuals, and only attackers (independent, ISPs, nation-states) or researchers would log this information. If I make an in-person payment at a coffee shop with Bitcoin or another transparent cryptocurrency, Tor isn’t going to help me hide my balance from the merchant.

Furthermore, there is a lack of ground truth when viewing IP metadata. Suppose User A first sends a transaction to Node B, which passes it along to Node C. Node C only receives Node B’s IP address, and it does not know if Node B is the one that sent the transaction. Likewise, Node B does not know if the transaction came from another node, or if it came from the user directly.

The situation is different of course if you connect to remote nodes. In this case, the remote node unquestionably knows the transaction came from your IP. If you care about your privacy when using remote nodes, you better use another anonymization technique or trust the remote node to destroy your data.

I’m not going to talk about the best mechanisms to protect yourself from Monero IP metadata analysis today (you can use this guide from Whonix, this guide for Qubes, or this guide from fireice_uk). Instead, I will talk about the number of nodes that need to be controlled by attackers to connect directly to a certain proportion of nodes on the network. These techniques are built for analysis on Monero, but they can be applied to any cryptocurrency. I include a table for Bitcoin at the end of the article.

How Many Nodes Do Attackers Need to Connect to Most Users?

Monero has approximately 1700 nodes at the time of writing (edit: manual scans suggest closer to 3200 nodes). By default, the Monero daemon software connects to 8 nodes. Therefore, we need to operate a mathematical test to see the proportion of nodes an attacker will connect to with a specific number of malicious nodes. For a given user, we test to see the chance of them connecting to 1 or more malicious nodes out of 8.

I used the following formula in Excel, though someone with more experience may know a trick to make it look neater:

=1-BINOM.DIST(1,8,PROPORTION CONTROLLED BY ATTACKER,TRUE)+BINOM.DIST(1,8,PROPORTION CONTROLLED BY ATTACKER,FALSE)

You can see here the proportion of nodes that the attacker needs to control (first column) to connect to the total proportion of nodes on the network (fourth column). Column 2 shows the number of nodes that the attacker would need to control if they already control them. Column 3 shows the number they would need to spin up if they were starting from scratch (if they currently control 0 nodes). The proportions stay the same, so as the network becomes larger with more nodes, the more nodes that an attacker needs to control.

Since the attacker only needs to hit at least 1 out of 8 connections for a node, they can gain a sizeable amount of information with a relatively small number of nodes.

How Many Independent Attackers Can Reasonably Learn Information?

I then decided to run a different test to find out what the maximum number of attackers who could have a certain level of oversight over IP transaction data. Suppose two non-colluding attackers want to have as much access to the network as possible and spin up thousands of nodes. Ironically, these two attackers are competing against each other. If no one else controlled any nodes, then they would each control half of the nodes, which would still give them visibility over most of the nodes (see the first figure — it’s over 99%).

Suppose 10 equal attackers were competing against each other. In this case, they would only have visibility over about 60% of the nodes. Only 4 independent attackers can directly connect to more than 90% of the nodes. In general, the more motivated attackers there are, the lower amount of information each can see individually. This is an interesting phenomenon. It would be impossible for every country in the world, for instance, to have high visibility over IP metadata unless they colluded to share information. Active attackers directly compete against others, though passive attackers (eg: ISPs) could have more oversight.

Why Not Reduce the Nodes User Connect To?

We could solve this issue by simply connecting to only 1 node, right? While this would limit the attack on paper, there are a number of limitations that make this impractical.

The more nodes a client connects to, the faster transactions get broadcast to the network. Furthermore, it lowers the chance that users connect to malicious nodes that attempt to block their transactions from being propagated or feed the user malicious blocks.

What About Bitcoin?

Bitcoin has about 10,000 nodes at the time of writing. The second figure doesn’t change, but the first figure does.

Since Bitcoin Core also makes 8 outgoing connections by default, an attacker would need to spin up about 1000 nodes to connect to half of all the nodes directly.

Conclusion

This is just a little fun experiment I worked on today. Users should be aware of the attacks necessary to learn IP metadata. Server hosting companies like Amazon and Digital Ocean likely control a noticeable percent of all network nodes, so they could have insight over IP metadata (though they wouldn’t be configured to connect to so many nodes by default). Of course, keep in mind that this is just a heuristic. Even if an attacker connects to most other nodes directly, they still do not know for sure if transactions are made by these node operators directly or are made by others and propagated. However, these attacks become powerful when attackers control about 20% or more of the total nodes. Worried users should follow steps to protect their IP if they are worried about leaking this metadata to others.


Justin Ehrenhofer is a senior at the University of Minnesota studying finance and management information systems. He is an advocate for privacy in distributed systems. He is a moderator of the r/CryptoCurrency subreddit, which is the most active cryptocurrency community.

Twitter: @JEhrenhofer