Live Ethereum and Bitcoin Data in Google BigQuery and Pub/Sub
--
The crypto_ethereum and crypto_bitcoin datasets in BigQuery are now updated using the streaming technology. You can also subscribe to public Pub/Sub topics that feed those tables.
The overall architecture is depicted below:
The following blockchains are covered:
- Ethereum
- Bitcoin
- ZCash
- Litecoin
- Doge
- Dash
We added delays for each blockchain that prevent streaming orphaned blocks resulting from chain reorganisations. You can look up how many blocks we lag behind the tip of the chain in the LAG_BLOCKS
parameter in the configuration files in the Github repository https://github.com/blockchain-etl-streaming. Those values were calculated based on the longest orphaned chains within the last year, multiplied by a safety factor. For Ethereum we ended up with an offset equivalent to approximately 4-minute lag. For Bitcoin this value is equivalent to a 30-minute lag.
We are currently working on a solution that will handle chain reorganisations using additional message types and will allow subscribing to the most up-to-date data.
Subscribing to Live Data Feeds
> curl https://sdk.cloud.google.com | bash
> exec -l $SHELL
> gcloud init
Create a Pub/Sub subscription for Ethereum internal transactions:
> gcloud pubsub subscriptions create crypto_ethereum.traces.test --topic=crypto_ethereum.traces --topic-project=crypto-public-data
Read a single message from the subscription to test it works:
> gcloud pubsub subscriptions pull crypto_ethereum.traces.test
Now you can run a subscriber and process the messages in the subscription, using this Python script:
subscribe.py
import timefrom google.cloud import pubsub_v1# TODO project_id = "Set Your Google Cloud Project ID Here"
subscription_name = "crypto_ethereum.traces.test"subscriber = pubsub_v1.SubscriberClient()
# The `subscription_path` method creates a fully qualified identifier
# in the form `projects/{project_id}/subscriptions/{subscription_name}`
subscription_path = subscriber.subscription_path(
project_id, subscription_name)def callback(message):
print('Received message: {}'.format(message))
message.ack()subscriber.subscribe(subscription_path, callback=callback)# The subscriber is non-blocking. We must keep the main thread from
# exiting to allow it to process messages asynchronously in the background.
print('Listening for messages on {}'.format(subscription_path))
while True:
time.sleep(60)
Install the dependencies and run the script:
> pip install google-cloud-pubsub==1.0.1
> python subscribe.pyListening for messages...
Received message: Message {
data: b'{"type": "trace", "transaction_index": 158, "from_...'
attributes: {
"item_id": "trace_call_0xce2ce80594f7601726d03114366161a0050d4a0beedd8628655f1a19319f203d_"
}
}
Received message: Message {
data: b'{"type": "trace", "transaction_index": 159, "from_...'
attributes: {
"item_id": "trace_call_0x7d121090c65c93ac6ba99764bae40ef384c388a139578194c503b92228ccfb3d_"
}
}
...
You can also use Go, Java, Node.js or C#: https://cloud.google.com/pubsub/docs/pull.
The first 10GB of data in Pub/Sub is free, after that you’ll pay ~$40/TB. There is on average 40GB of Ethereum traces per month, which would amount to ~$1/month.
You can also subscribe to topics with Bitcoin blocks and transactions and Ethereum blocks, transactions, logs, contracts, and tokens. The topic names follow the naming convention of the BigQuery tables so you can easily locate them: projects/crypto-public-data/topics/crypto_{chain}.{table_name}
, where:
chain
can be one ofethereum
,bitcoin
,zcash
,litecoin
,dogecoin
, ordash
.table_name
can be one ofblocks
ortransactions
. Additionally for Ethereum:logs
,token_transfers
,traces
,contracts
, andtokens
.
Next up is an article about a Twitter bot that posts anomalous transactions using Dataflow and Cloud Functions.
Also read:
- How to Query Balances for all Ethereum Addresses
- Calculating Gini Coefficient in BigQuery
- Ethereum in BigQuery: how we built this dataset
- Real-time Ethereum Notifications for Everyone for Free
Follow us on Twitter: https://twitter.com/BlockchainETL