Bitcoin Users Reveal More Private Information Than They Realize

Most people think bitcoin is anonymous. In fact, new technologies trace BTC transactions, attempting to identify bitcoin users.

A number of startups have raised money to explore these new possibilities, such as:

Given these are private services, an explanation is needed.

What Exactly Are These Companies Doing?

Most of them are offering a service called clusterization.

Clusteriza-what-now?

To keep it simple: It means monitoring all transactions on the blockchain. Clusterization is grouping transactions together. The goal is to keep tabs on organizations, companies and people using cryptocurrencies.

Why Would Anyone Want That?

Knowing the source and destination of funds is necessary to fulfill a number regulatory requirements. This is required for any service engaged in money transmission.

When dealing with fiat currencies this is typically done in several ways. One is through customer interrogation by asking things like, “where did you get these funds?” Another is digital data verification, for example a bank determining the origin of a wire transfer.

In the realm of digital assets stored on a blockchain, this becomes difficult.

Assets on digital currency networks are “pseudo anonymous”. While collecting information directly from customers is possible, digital verification is less straightforward. Software and services are needed to fill the gaps in knowledge.

Clustering to the Rescue?

Clusterization allows for deep data analysis of transactions on the blockchain. It correlates transactions so the source and destination of funds can be better determined.

This decreases a digital asset’s fungibility, the ability for a one asset to be substituted for another. It also enables better monitoring of attempts to launder assets.

When properly implemented, the following data can be obtained via clusterization:

  • A set of addresses controlled by the same entity
  • Transfers of funds between two known entities
  • Correlation of two different entities
  • The history of incoming funds via past clusters

There are research efforts looking at this specific problem. One such is through the open source project “Bitiodine” by Michele Spagnuolo, now working at Google. You can read the Bitiodine paper here.

An open source implementation can be tested at Bitiodine.net, which is currently hosted on a 512GB RAM 40-Core bare metal server donated to Bitaccess. Fast clusterization takes a lot of computing power.

How Does Clustering Work?

There are many ways to cluster transactions, since they are not complicated. One really simple method is to monitor the inputs of a transaction.

In the transaction above, you can see that inputs (on the left) are used from multiple addresses to send to a destination (on the right). Bitcoin clients use multi-input transactions to avoid having to send multiple transactions.

This also means whenever a transaction has multiple input addresses, we can safely assume those addresses belong to the same wallet.

This is because the private key of each of these addresses was needed to sign the transaction. Extrapolating this data further along the blockchain allows for the formation of clusters of addresses. These clusters, as a result, are historical data points and far greater in size than a single transaction. CoinJoin uses this same date to scatter clusterization and obfuscate the source of the coins.

Let’s Look at Some Examples

One neat service that is readily available is WalletExplorer. This is a free block explorer that tries to cluster addresses. Scorechain offers a similar free tiered service.

For example, you can look at all the addresses of a common merchant processor here, which is currently around 172,000 addresses.

Here is another interesting one. It is the last known wallet cluster of a dark market similar to Silk Road. It has around 500,000 addresses in its cluster.

The implications of this analysis are that just about every publicly accessible service can be very easily monitored. Transaction volume, transactions per day, etc. are all publicly available. Not only that, but the addresses of their customers are also easily accessible.

Here is an example of one such customer. Using a few simple searches, we can see that this customer is using both a merchant processor and a dark market quite regularly.

It seems that some of this customers’ transactions directs funds from a dark market cluster to the merchant cluster. While the identity of the customer is not publicly known, it is clear the customer is using both services.

What Does This Mean For Users and Companies?

The first thing this indicates is users are divulging more private information about themselves than they may be aware of.

Users are generally indirectly divulging that they are a customer of other services.

For companies, this can be useful for a few purposes. Companies can set up internal policies to refuse funds they feel do not adhere to their terms or policies.

Exchanges, wallets and brokerages can build a better picture of who their customers are, how often they transact, what their income is, and what services they use. It can also be used for support. If a customer has a support request, a service can have a better idea of what wallet they are using, such as Xapo, and can better assist them.

Unfortunately, access to these tools is restricted to companies and governments who pay for clusterization services.

This means that the bitcoin community is somewhat underserved. Most bitcoin users don’t have access to the latest set of clusterization tools. As a result, they may not be aware of what information they are indirectly disclosing.

The capabilities of private tools far outweigh those of the publicly available ones mentioned in this article.

So, What’s Next?

  1. The Bitcoin community needs to be more aware of privacy implications of this type of analysis. Bitcoin is not anonymous. Customers need to be aware of what privacy bitcoin affords, and what it does not.
  2. More open-source development is needed in blockchain analysis. Currently, there are few active open source projects on this subject.
  3. Companies need to understand the capabilities of such tools and cater their policies accordingly.

Overall, clusterization is a good thing. It allows an interesting trade-off between privacy and compliance. However, bitcoin users need to be aware of this and companies/governments need to be knowledgable of its limitations.


Originally published at blog.bitaccess.co on April 4, 2016.