Live Ethereum and Bitcoin Data in Google BigQuery and Pub/Sub

The crypto_ethereum and crypto_bitcoin datasets in BigQuery are now updated using the streaming technology. You can also subscribe to public Pub/Sub topics that feed those tables.

The overall architecture is depicted below:

Blockchain ETL architecture

The following blockchains are covered:

  • Ethereum
  • Bitcoin
  • ZCash
  • Litecoin
  • Doge
  • Dash

We added delays for each blockchain that prevent streaming orphaned blocks resulting from chain reorganisations. You can look up how many blocks we lag behind the tip of the chain in the LAG_BLOCKS parameter in the configuration files in the Github repository https://github.com/blockchain-etl-streaming. Those values were calculated based on the longest orphaned chains within the last year, multiplied by a safety factor. For Ethereum we ended up with an offset equivalent to approximately 4-minute lag. For Bitcoin this value is equivalent to a 30-minute lag.

We are currently working on a solution that will handle chain reorganisations using additional message types and will allow subscribing to the most up-to-date data.

Subscribing to Live Data Feeds

Install Google Cloud SDK:

> curl https://sdk.cloud.google.com | bash
> exec -l $SHELL
> gcloud init

Create a Pub/Sub subscription for Ethereum internal transactions:

> gcloud pubsub subscriptions create crypto_ethereum.traces.test --topic=crypto_ethereum.traces --topic-project=crypto-public-data

Read a single message from the subscription to test it works:

> gcloud pubsub subscriptions pull crypto_ethereum.traces.test

Now you can run a subscriber and process the messages in the subscription, using this Python script:

subscribe.py

import timefrom google.cloud import pubsub_v1# TODO project_id = "Set Your Google Cloud Project ID Here"
subscription_name = "crypto_ethereum.traces.test"
subscriber = pubsub_v1.SubscriberClient()
# The `subscription_path` method creates a fully qualified identifier
# in the form `projects/{project_id}/subscriptions/{subscription_name}`
subscription_path = subscriber.subscription_path(
project_id, subscription_name)
def callback(message):
print('Received message: {}'.format(message))
message.ack()
subscriber.subscribe(subscription_path, callback=callback)# The subscriber is non-blocking. We must keep the main thread from
# exiting to allow it to process messages asynchronously in the background.
print('Listening for messages on {}'.format(subscription_path))
while True:
time.sleep(60)

Install the dependencies and run the script:

> pip install google-cloud-pubsub==1.0.1
> python subscribe.py
Listening for messages...
Received message: Message {
data: b'{"type": "trace", "transaction_index": 158, "from_...'
attributes: {
"item_id": "trace_call_0xce2ce80594f7601726d03114366161a0050d4a0beedd8628655f1a19319f203d_"
}
}
Received message: Message {
data: b'{"type": "trace", "transaction_index": 159, "from_...'
attributes: {
"item_id": "trace_call_0x7d121090c65c93ac6ba99764bae40ef384c388a139578194c503b92228ccfb3d_"
}
}
...

You can also use Go, Java, Node.js or C#: https://cloud.google.com/pubsub/docs/pull.

The first 10GB of data in Pub/Sub is free, after that you’ll pay ~$40/TB. There is on average 40GB of Ethereum traces per month, which would amount to ~$1/month.

You can also subscribe to topics with Bitcoin blocks and transactions and Ethereum blocks, transactions, logs, contracts, and tokens. The topic names follow the naming convention of the BigQuery tables so you can easily locate them: projects/crypto-public-data/topics/crypto_{chain}.{table_name}, where:

  • chain can be one of ethereum, bitcoin, zcash, litecoin, dogecoin, or dash.
  • table_name can be one of blocks or transactions. Additionally for Ethereum: logs, token_transfers, traces, contracts, andtokens.

Next up is an article about a Twitter bot that posts anomalous transactions using Dataflow and Cloud Functions.

Also read:

Follow us on Twitter: https://twitter.com/BlockchainETL

--

--

--

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Evgeny Medvedev

Evgeny Medvedev

Creator of https://github.com/blockchain-etl, Co-founder of https://d5.ai and https://nansen.ai, Google Cloud GDE, AWS Certified Solutions Architect

More from Medium

How we are building Multi-Cloud Portable Workspaces at NetBook

Send Messages From Pub/Sub To BigQuery Cheaper with Cloud Run

Triggering a Cloud Function at a specified time using Cloud Scheduler in GCP.

How to do product mix optimization in real-time