Guide to Query Bitcoin Balances Using BigQuery

Vivien Chua
Coinmonks
3 min readDec 9, 2022

--

In this article, we will show how to query balances for Bitcoin addresses in Google BigQuery.

In particular, we will show how to use LEAD, UNNEST and GENERATE_DATE_ARRAY to fill the gap on dates that there were no transactions on a blockchain address.

Extract on-chain Bitcoin data

Step 1

We will use the accounting concept of debits and credits to build a double entry book of all the transactions on the blockchain. This follows the ideas described on the Google Cloud website.

WITH double_entry_book AS (
-- debits
SELECT array_to_string(inputs.addresses, ",") AS address, -inputs.value AS value,
DATE(block_timestamp) AS date
FROM `bigquery-public-data.crypto_bitcoin.inputs` AS inputs
UNION ALL
-- credits
SELECT array_to_string(outputs.addresses, ",") AS address, outputs.value AS value,
DATE(block_timestamp) AS date
FROM `bigquery-public-data.crypto_bitcoin.outputs` AS outputs
),

Step 2

The entries are grouped by date and address.

The unit of ‘value’ is converted to bitcoin by dividing by 10^8.

double_entry_book_grouped AS (
SELECT date, address, SUM(value / POWER(10,8)) AS value
FROM double_entry_book
GROUP BY date, address
),

Step 3

For those dates that did not have any transactions, the time series shows up as completely missing records.

  • LEAD — returns the date after the gap for each address
daily_balances_gappy AS (
SELECT date, address,
SUM(value) OVER (PARTITION BY address ORDER BY date) AS balance,
LEAD(date, 1, CURRENT_DATE()) OVER (PARTITION BY address ORDER BY date) AS next_date
FROM double_entry_book_grouped
),

Step 4

In order to create synthetic rows, we will use UNNEST to convert an array into a table with one row for each element in the array, and GENERATE_DATE_ARRAY to create an array of dates from ‘2009–01–01' and current date.


all_dates AS (
SELECT date FROM UNNEST(GENERATE_DATE_ARRAY('2009-01-01', CURRENT_DATE())) AS date
),

Step 5

Next, we create a table that contains the gapless series of daily balances. The synthetic rows are JOIN onto the original time series to fill in the gaps.

daily_balances_gapless AS (
SELECT address, all_dates.date, balance
FROM daily_balances_gappy
JOIN all_dates ON daily_balances_gappy.date <= all_dates.date AND all_dates.date < daily_balances_gappy.next_date
),

Step 6

The daily balances for all addresses is computed.


all_daily_balances AS (
SELECT DISTINCT date, SUM(balance) OVER(PARTITION BY date) AS balance
FROM daily_balances_gapless
)

Step 7

The following query fetches values of all the columns in ‘all_daily_balances’ table.


SELECT * FROM all_daily_balances
ORDER BY date

Conclusion

The full query is shown below. The query ran in BigQuery using 218.6 GB.

WITH double_entry_book AS (
-- debits
SELECT array_to_string(inputs.addresses, ",") AS address, -inputs.value AS value,
DATE(block_timestamp) AS date
FROM `bigquery-public-data.crypto_bitcoin.inputs` AS inputs
UNION ALL
-- credits
SELECT array_to_string(outputs.addresses, ",") AS address, outputs.value AS value,
DATE(block_timestamp) AS date
FROM `bigquery-public-data.crypto_bitcoin.outputs` AS outputs
),

double_entry_book_grouped AS (
SELECT date, address, SUM(value / POWER(10,8)) AS value
FROM double_entry_book
GROUP BY date, address
),

daily_balances_gappy AS (
SELECT date, address,
SUM(value) OVER (PARTITION BY address ORDER BY date) AS balance,
LEAD(date, 1, CURRENT_DATE()) OVER (PARTITION BY address ORDER BY date) AS next_date
FROM double_entry_book_grouped
),

all_dates AS (
SELECT date FROM UNNEST(GENERATE_DATE_ARRAY('2009-01-01', CURRENT_DATE())) AS date
),

daily_balances_gapless AS (
SELECT address, all_dates.date, balance
FROM daily_balances_gappy
JOIN all_dates ON daily_balances_gappy.date <= all_dates.date and all_dates.date < daily_balances_gappy.next_date
),

all_daily_balances AS (
SELECT DISTINCT date, SUM(balance) OVER(PARTITION BY date) AS balance
FROM daily_balances_gapless
)

SELECT * FROM all_daily_balances
ORDER BY date

In the next article, we will show how to use this query to compute Bitcoin balances on exchanges.

Thank you for reading!

If you liked the article and would like to see more, consider following me. I post regularly on topics related to on-chain analysis, machine learning and BigQuery. I try to keep my articles simple but precise, providing code, examples and simulations whenever possible.

--

--

Vivien Chua
Coinmonks

I invest in companies. CIO and co-founder at Meadowfield Capital. Stanford PhD.