How to Find Bitcoin’s Best Traders — Deep into the Bitcoin Blockchain

Alex Woodstock
5 min readMay 25, 2024

--

Find and Copy Best Bitcoin Traders

As of today, there are 1.3 billion unique Bitcoin addresses registered in the network. However, only 52 million of them currently have a non-zero balance. Nevertheless, this is quite a large dataset. The original idea was simple. The Bitcoin blockchain is a public network. Absolutely all btc transactions are visible and transparent. In theory, we can trace each bitcoin address, identify its transactions, and ultimately find those btc addresses that trade successfully and profitably.

Great, let’s put the theory into practice. We can access bitcoin blockchain data through various means. If you have infrastructure on Amazon, you can utilize its dataset. For a more robust product approach, running your own bitcoin node is likely the best option. Additionally, there are various API providers available. However, I haven’t found any budget-friendly solutions. As for me, I acquired a database dump from non-public sources for a nominal fee.

For those who haven’t previously encountered the inner workings of Bitcoin, ts principles may seem a bit unconventional.

In Bitcoin, inputs and outputs are fundamental components of transactions:

Inputs: Inputs are references to previous unspent transaction outputs (UTXOs) that are being spent as part of a transaction. Each input includes a reference to a UTXO (the transaction ID and the output index), along with a cryptographic signature that proves ownership of the UTXO. Inputs specify where the funds for the transaction are coming from.

Outputs: Outputs specify where the funds being spent in the transaction will go. Each output includes an amount of bitcoin and a locking script that defines the conditions under which the funds can be spent in the future. Outputs represent the recipients of the transaction and typically include addresses or locking scripts that require specific conditions to be met to spend the funds.

Together, inputs and outputs determine how bitcoin is transferred from one entity to another in a transaction. Inputs reference previous UTXOs as the source of funds, and outputs specify where the funds will be transferred and under what conditions they can be spent in the future.

If you’re interested in more detailed information about how Bitcoin works, I recommend consulting specialized resources. This information is only needed at the stage of converting Bitcoin transactions into a familiar format. In this case, such information is not as relevant to us.

In the end, we can consolidate all transactions into one table with columns:

transaction hash
time transaction
bitcoin address
bitcoin size of transaction
usd size of transaction

If we are bitcoin buyers then bitcoin_size will be positive value and usd_size negative value, if we are bitcoin sellers then bitcoin_size < 0/ usd_size > 0.

Since Databricks fully open-sourced their Delta Table, I primarily use it in most cases. In this case, the data was approximately 600 GB in size. If you are the lucky owner of a cluster with around 2 TB of memory, then in principle, you can keep everything in one table. Unfortunately, my setup was much more modest, so the only way to handle dataset was to partition the data. I divided the data into 100 partitions based on the hash of the address. All further calculations were performed only within one partition.

So, by what criteria will we determine the best traders? The first thing that comes to mind is the level of profit. Ok, lets check.

If readers are interested, I can conduct some comparative tests using Spark, Pandas, or Polars. I can say that the main time is spent on data loading, so the performance of the framework is not so critical. But to put it briefly, Polars was the fastest. Besides, I simply like it; I prefer to use it whenever possible.

def get_stats_by_address(all_transactions):    
query = 'select address, sum(btc_size) as delta_btc, sum(usd_size) as delta_usd)
from frame GROUP BY address'
df = pl.SQLContext(frame=all_transactions).execute(query).collect()
df = df.with_columns((pl.col('delta_btc')*btc_last_price + pl.col('delta_usd')).alias('profit'))
return df

Variable btc_last_price is just bitcoin last price) We need it to calc value for our bitcoins. Column [profit] will contain our profit in USD.

Ok. We’ve got bitcoin address with profit about 60 mln dollars. Is it good investor? Let’s see at equity

This address bought Bitcoin back in 2017 and essentially hasn’t touched it since. All the profit came from the appreciation of these old coins. This address experienced the entire Bitcoin downturn 2020 & 2022 years. This is likely not the type of investing we are looking for. We don’t want to endure significant drawdowns.

Ok, lets find address with a little drawdown.

Rising equity without drawdowns but this address participated in the market for only a few months, simply entering and exiting successfully once.

Okay, I want to identify addresses that are consistently active in transactions. Let’s target addresses that have engaged in transactions in at least 4 different years, with the latest transaction occurring no later than 2023.


def get_active_addresses(all_transactions):
query = 'SELECT address, COUNT(DISTINCT year) AS count_years, MAX(year) as last_year FROM frame GROUP BY address'
df = pl.SQLContext(frame=all_transactions).execute(query).collect()
return df.filter(pl.col('last_year') >= 2023). filter(pl.col('count_years') >= 4)

df1 = get_stats_by_address(all_transactions)
df2 = get_active_addresses(all_transactions)

df1 = df1.join(df2, on='address', how='inner').filter(pl.col('profit') > 10000)

For one partition

Total addresses - about 11 mln
Profit addresses with profit > 10K USD - about 46.000

Profit addresses with profit > 10K USD and (4 years activity, with last transactions at least 2023 year) - only 700 addresses meet this condition

There are addresses that give a nice equity

However there are also ones like this

So we have to additional filters.

Nevertheless, everything seemed acceptable until I learned about change addresses and other features of Bitcoin anonymity)) But that’s a different story(jumping ahead, I’ve found a solution to this problem). If you enjoyed the topic, please support the article with likes and shares. It’s interesting to discuss and exchange opinions.

--

--

Alex Woodstock

Individual investor and IT professional. 20+ years of experience in investing. Quantitive approach. Options strategies for stocks & commodities.