Effortlessly Pulling Ethereum Block Data into Azure Machine Learning
Azure Machine Learning, Microsoft’s managed machine leaning service, has multiple mechanisms to pull data for processing. One of those mechanisms is downloading a file from a remote URL, which we will be using in this article by creating a simple Azure Function that accepts a block number and returns a JSON lines file with a list of the transactions in that block.
We will be creating an Azure Function, so Azure Functions Core Tools should be installed on the workstation. Also, the linked GitHub repository has a Terraform based infrastructure deployment script, so it will require both Terraform and Azure CLI installed on the workstation.
A complete example of the Azure Function and a Terraform script that can be used to deploy the infrastructure required to test the function is available in the following GitHub repository:
Contribute to cladular/azure-func-pull-ethr development by creating an account on GitHub.
We will start by creating a new C# HTTP Trigger based Azure Function called
GetTransactions`, using Azure Functions Core Tools’
func CLI tool:
func init --worker-runtime dotnet
Note that the project name is taken from the name of the containing directory.
func new -l C# -n GetTransactions
Next, we will add a reference to
Nethereum, .Net based library for interacting with Ethereum based networks:
dotnet add .\[PROJECT-NAME].csproj package Nethereum.Web3
PROJECT-NAME with the name of the project you created.
Now we can write the function’s logic, which will include:
- Accepting a block number as part of the URL route
- Connecting to Cloudflare’s Ethereum gateway.
- Retrieving the sent block number’s transactions information.
- Iterating over the transactions and reading the
toaddress and the transaction’s value, which is the amount of Ether that was transferred.
- Encoding the results to JSON lines and sending it as an
application/octet-streamresponse (a file).
And once we are done writing the code, we can build and publish the function to a local folder:
dotnet publish --configuration Release /property:GenerateFullPaths=true /consoleloggerparameters:NoSummary
Testing the Function
To easiest way to test the function is using the Terraform script included with the example GitHub repository mentioned above, by first logging in to Azure:
And then running the Terraform script:
terraform apply -auto-approve .\eng\
Once the infrastructure is deployed, we can archive the content of thepublished Azure Function directory (should be /src/bin/Release/netcoreapp3.1/publish) and deploy it to the newly created function app.
Note that if you are using Visual Studio Code you can use tasks that were created as part of the GitHub repository to deploy (and later remove) both the included example function and the infrastructure in one command:
tf-apply: Builds and publishes the function and then creates the infrastructure and deploy the function.
tf-destroy: Removes all the infrastructure resources that were created by the
Now we launch the Azure Machine Learning Studio using the “Launch studio” button in the machine learning resource that was created, and go to the “Datasets” screen:
Create a new web files dataset:
With the following basic info:
- Web URL (will be using block 12,000,000, a nice round number):
Adjust function URL as to match your environment
- Name: Transactions
- Dataset Type: Tabular
- Check “Skip data validation” as the size of the data might cause the validation to fail
Hit “Next” and then just change the file format to
JOSN Lines and hit “Next” two more times, and then hit “Create” in the confirmation step.
You now have an entry called “Transactions” in the datasets page, click on it, go to the “Explore” tab, and you should now see the list of transactions for Ethereum block 12,000,000:
The function in this article filters the data to only Ether transactions, it can be improved to pull much more elaborate data (such as smart contracts events’ data), which in turn can be processed by Azure Machine Learning. The File URL options of the service, practically allows us to create small wrappers for every possible external data source we can think of.