Supercharge Ethereum Data Extraction With ChatGPT!

Mantle Network
0xMantle
Published in
4 min readMay 24, 2023

In the realm of blockchain, data is the fuel that powers insights. Having the ability to extract information from the blockchain is a valuable skill. Thankfully, the combination of ChatGPT and Google’s BigQuery presents a game changing solution.

In this article we will explore how we can use ChatGPT and Google BigQuery to streamline and enhance the process of extracting data from the blockchain. The article will also provide you with a simple example on how to combine ChatGPT and BigQuery.

ChatGPT

ChatGPT (more generally known as Large Language Models, or LLMs), is just a tool, and like any other tool, knowing how to use it efficiently can and will x10 your productivity. The true power of ChatGPT is its ability to parse unstructured data, extract the relevant bits, and convert it into a standard, structured format.

The secret lies in providing ChatGPT with as much context as possible, due to their nature as simulators.

BigQuery

BigQuery is a fully managed data warehouse solution which allows you to query massive datasets. A lot of people don’t know this, but there is a publicly managed database for Ethereum on BigQuery.

What that means is that you can perform massively parallel data analysis on Ethereum simply by writing some Structured Query Language (SQL). And what does ChatGPT excel at? Structured outputs!

Simplified Data Extraction

Traditionally, writing SQL code for data extraction is a very time consuming and intricate task — oftentimes requiring dedicated domain knowledge. But, we can leverage the power of ChatGPT and streamline it.

Let’s begin with a concrete example — finding all possible Masterchef contracts. We can do this by filtering out all contracts that contain the function signature “migrate(address)”, which yields us the function signature 0xce5494bb.

Problem is, how do we do this in BigQuery?

SQL With ChatGPT

The first thing we need to do is extract the schema of the BigQuery contract database:

This will result in the beginning of our ChatGPT query like so:

# ChatGPT query prompt
Given a google bigquery SQL tables, with the schema:

bigquery-public-data.crypto_ethereum:
contracts(
address: string,
bytecode: string,
function_sighashes: array[string],
is_erc20: boolean,
is_erc721: boolean,
block_timestamp: timestamp,
block_number: integer,
block_hash: string
)

We will then provide it more context via the following prompt:

As a senior analyst who can only communicate in bigquery sql, given the above schemas and data, write a detailed and correct bigquery sql query to answer the analytical question, without any explanations. I repeat, do NOT give any explanations:

Find me 100 contracts which contains the function_sighash “0xce5494bb”

When supplied to ChatGPT:

You get the following SQL code, when passed into BigQuery nets you:

With a few simple prompts, we are now about to filter out 100 of the oldest contracts that contain the function signature “migrate(address)”. Note that the output from BigQuery might not indicate that it is a masterchef contract, but rather a contract that contains the specific function signature.

We can also modify the query to filter out contracts that have been deployed after 2020.

For more complex queries like token transfers with the token name “ObamaShibaInu” between X date and Y date, you’ll need to manually write out the BigQuery table schema into ChatGPT, but that is beyond the scope of this simple tutorial.

Conclusion

Efficient data extraction is the key to informed decision making. By leveraging ChatGPT and BigQuery, users can rapidly extract massive amounts of data from the blockchain to provide insight and identify patterns.

--

--

Mantle Network
0xMantle

Mantle | Mass adoption of decentralized & token-governed technologies. With Mantle Network, Mantle Treasury, and token holder-governed products initiatives.