Guide: data in data out — how you might edit the dataset for your worker nodes

Published in

Official Allora Community

6 min readSep 1, 2024

Following the successful release of Testnet V2 — the upgrade that brought a lot of interest to the Allora network, many crypto and AI enthusiasts use the Allora network to run worker nodes to collect Allora points in the Point Campaign.

[read about the point campaign here: https://www.allora.network/blog/launching-phase-2-of-the-allora-points-program-30832]

We have discussed how to simply change from the classic yet maybe insufficient Linear Regression to other kinds of regression models provided by scikit-learn in my previous article here:

[https://medium.com/official-allora-community/guide-a-simple-way-to-customise-your-worker-node-on-the-allora-network-e698aa2916fc]

This time, we will discuss how you may edit the dataset that goes into the model. This exercise will involve dealing with two key files:

model.py — this is where dataset downloading, formatting, and model training functions are defined
app.py — as we will modify the input dataset for model training, we have to make sure that the input that we give the model for prediction is the same in terms of both features and format if we don’t want it to result in error

Once again, I will use the template basic-coin-prediction as an example for the step-by-step guide but the concept can be applied to any other setup.

Here are the easy steps:

1️⃣We are now in the basic-coin-prediction-node directory after you have configured everything to test that the basic-coin-prediction-node runs perfectly

cd $HOME/basic-coin-prediction-node

2️⃣We will get inside the model.py file — I use vim but you can use whatever command you like

vim model.py

3️⃣Next, we will scroll down to the download_data() function

In this function, there are parameters that you should be familiar with although it is different from where the focus of this article is.

token — this is my edit so we only have to change the token name that will be used in the next parameter “symbols”

symbols — this is the name of the trading pair you will download from the Binance free repository which is the default data source for the basic-coin-prediction node

intervals — this is how you can change the interval between each data point. I usually think of it as the time interval of each candle stick you see on a chart. Binance supports various intervals like ["1m", "5m", "1h", "1d", "1w", "1M"].

each candle stick represents 1 day interval in this case, which is what the parameter intervals is about

years/months — these two parameters define the months and years of the historical dataset you want to download from the Binance free repository.

Below the monthly data download section, you will also find the daily data download section that will download the latest updated data available. Please bear in mind that the Binance free repository only provides datasets up to D-1 so it does not have real-time today’s price movement.

4️⃣Then, we will look at the format_data() function

While we still will not have to make any changes to this function, it has key information that you need to know before we move to the next step. The df.columns parameter is a list of the names of features in the dataset. By looking at this list, we know what features you can pick for your model when using the Binance dataset.

5️⃣The highlight is in the train_model() function

Originally, the x parameter only has df[“date”] and here I’m adding five other features from the list in step 4️⃣ as you can see in the big green box. To complete this, we will also have to edit the list of features of the x parameter in the smaller green box below so it contains all the features we have defined above and change the shape from (-1, 1) originally to (-1, n) where n is the number of features in x parameter.

In this example, I have also shifted the x parameter’s rows one step forwards by deleting its last row and shifted the y parameter’s rows one step backwards by deleting its first row. The purpose is to use the information from the one previous time step (time T-1) to predict the price at the step of interest (time T). This is why I have included the .value[:-1] before reshaping in the formatting of the x parameter and .value[1:] in the formatting of the y parameter.

This step concluded the changes we will make in the model.py file.

It is crucial to note that for whatever features you decide to include here, you must be able to provide the same features when requesting inferences in the app.py file

6️⃣In the next step, we will go to app.py file

vim app.py

The get_eth_inference() function is where we will make the necessary changes to make sure that the input for model training is the same as the input for inference prediction request.

As mentioned a few times earlier we need feature input for making predictions and the ideal case is to get the real-time updated feature data. However, it cannot be obtained with the Binance dataset as Binance only provides a D-1 update. So I use the latest available input (except the time_stamp feature that we can just use now time_stamp) from the dataset we have already downloaded in this example.

The steps required are highlighted in the green box as follow:

loading training_price_data into a Pandas dataframe
extract the last value of each feature that we have selected (in this case, they are volume, n_trades, high, low, and price)
since the price parameter we used for training is a mean of open, close, high, and low prices — we similarly find that the average
putting all features together and reshaping them into the format required by the model

With this being done, we have completed all required steps to edit the dataset for model training and requesting inferences

7️⃣Rebuild the docker and restart the containers with

docker compose build
docker compose up -d

8️⃣If all went well, then check your inferences

curl http://localhost:8000/inference/<token>

Further improvement

Obviously, what I showed in step 6️⃣ is not ideal. We want to get the latest updated data for the prediction. This may be achieved by fetching real-time updates from other sources however it may require a paid account and it usually comes with a limited number of calls. Coin-gecko API calls via a free account used in the hugging face example could be a stop-gap option but it also has limitations.

About the Allora Network

Allora is a self-improving decentralized AI network.

Allora enables applications to leverage smarter, more secure AI through a self-improving network of ML models. By combining innovations in crowdsourced intelligence, reinforcement learning, and regret minimization, Allora unlocks a vast new design space of applications at the intersection of crypto and AI.

To learn more about Allora Network, visit the Allora website, X, Blog, Discord, and Developer docs.

Guide: data in data out — how you might edit the dataset for your worker nodes

Here are the easy steps:

About the Allora Network

Written by CST🟢(🔴🟦🟣)