Revamped Federated Analytics for Ravenverse!

Unnikrishnan Menon
RavenProtocol
Published in
5 min readSep 20, 2022

On top of Distributed Computing, the Ravenverse now also has support for Federated Analytics (FA), a technique useful for conducting data analysis without directly observing a client’s private data. You can read more on FA here.

This post dives deeper into the working of our improved FA release.

Requester and Provider

In Ravenverse, the Requester is the entity that wants to conduct analysis of statistical metrics across data that is hosted on devices across the globe. They can define certain constraints and filters that the data must conform to. The Ravenverse bridges the gap between the Requesters and these devices by allowing only those devices with valid data (satisfying the constraints) to participate without exposing any private data.

On the other hand, Providers are entities that host cleaned data locally. For instance, it could be the medical records of patients in a hospital.

For the sake of this demo, we will be considering an arbitrary dataset of Office data that consists of 4 columns hosted on the Provider.

Sample Office Data

Getting your Ravenverse Token

Visit the Ravenverse Website (preferably on Chrome) and login using your MetaMask Wallet Credentials. More details on account creation can be found in this article.

Copy your Private Ravenverse Token. This token will be required by all Raven libraries to authenticate transactions and data transfers.

Setup

Clone the Ravenverse Repository

git clone https://github.com/ravenprotocol/ravenverse.git

It is recommended to create a conda virtual environment with python 3.8 for Ravenverse.

Now inside the cloned folder, install the Python dependencies

pip install -r requirements.txt

Requester Side

Inside the Requester subfolder, you should find a .env file in which you need to paste your Ravenverse token.

Running the Script

The Requester’s script (Ravenverse/Requester/create_federated_analytics.py) utilizes the Ravop library to compile and execute FA ops. In this demo, we will be finding the mean, variance and standard deviation of the Office Data hosted on the Provider.

cd Requester/python create_federated_analytics.py

Let’s dive deeper into the script.

Importing the libraries

import os
from dotenv import load_dotenv
load_dotenv()
import json
import ravop as R

First, we initialize the requester with their Ravenverse token, after which we clear any previous/existing graphs using the R.flush() method. Note that the initialize function of Ravop fetches the token directly from the .env file.

R.initialize(ravenverse_token=os.environ.get("TOKEN")) 
R.flush()

Next, we define a FA Graph (along with the required data constraints) which groups the forthcoming ops under one roof. This graph is now owned by the Requester.

R.Graph(name="Office Data", approach="federated",
rules=json.dumps(
{"rules": {"age": {"min": 18, "max": 80},
"salary": {"min": 1, "max": 5},
"bonus": {"min": 0, "max": 10},
"fund": {}
},
"max_clients": 1}))
  • name: The name for the graph set by the requester. Preferably a meaningful name that allows clients to identify the type of dataset desired by the requester.
  • approach: Set to ‘federated’.
  • rules: The rules dictionary must contain the names of all the columns of data required by the requester for aggregation and their corresponding constraints as shown above. The clients will then be able to filter their data accordingly. Note: An empty dictionary for a column signifies no constraints. All values in that column shall be considered.
  • max_clients: The number of clients whose data must be aggregated and returned to the requester.

Next, we define Ops for our statistical analysis.

mean = R.federated_mean()
variance = R.federated_variance()
standard_deviation = R.federated_standard_deviation()

Next, we make our Ops Persist. Ops that have been set to persist can later be retrieved after the execution of the graph is complete.

Note: Make sure that the name parameter for each persisting Op is unique within a graph so that later it can be retrieved.

mean.persist_op(name="mean")
variance.persist_op(name="variance")
standard_deviation.persist_op(name="standard_deviation")

Once all Ops of the graph has been defined, the requester must activate their graph. This step completes the compilation procedure and makes the graph ready for execution. No more Ops can be added to the graph after this.

R.activate()

Upon Activation, the Ravenverse compiles the defined Ops and notifies the Requester of how many Raven Tokens it will cost to execute their graph.

Next, we execute the FA graph. On execution, the Provider nodes will be allowed to participate if they possess the relevant data on their systems. The requester can also track the progress of the graph.

R.execute()
R.track_progress()
Compilation and Execution of FA graph

Once executed, the Requester can fetch the computed results of the Ops that they had previously set to persist.

mean_output = R.fetch_persisting_op(op_name="mean")
print("mean: ", mean_output)

variance_output = R.fetch_persisting_op(op_name="variance")
print("variance: ", variance_output)

standard_deviation_output = R.fetch_persisting_op(op_name="standard_deviation")
print("standard_deviation: ", standard_deviation_output)

These fetched ops will contain a dictionary of values corresponding to each row in the dataset.

The output of fetched FA ops

Provider Side

Inside the Provider subfolder, you should find a .env file in which you need to paste your Ravenverse token.

Running the Script

The Provider’s script (Ravenverse/Provider/run_federated_provider.py) utilizes the Ravpy library to view and participate in existing FA graphs. The Office Data (shown earlier) is already included in the Provider subfolder (Ravenverse/Provider/data/data1.csv).

cd Provider/python run_federated_provider.py

Let’s dive deeper into this script…

Importing the libraries

import os
from dotenv import load_dotenv
load_dotenv()
from ravpy.utils import list_graphs
from ravpy.federated.participate import participate
from ravpy.initialize import initialize

First, we initialize the provider with their Ravenverse token. Note that the initialize function of ravpy fetches the token directly from the .env file.

client = initialize(os.environ.get("TOKEN"))

Next, we list the currently active graphs in which the provider can potentially participate.

list_graphs(approach="federated")
Listing currently active graphs

After viewing the rules, the Provider can participate in the graph execution with their local dataset. Note: The dataset must be a .csv file.

participate(graph_id=1, file_path="data/data1.csv")
Participation Logs

After participation, the Provider will be aptly rewarded with Raven Tokens that they can view and withdraw into their Metamask wallet, from their Ravenverse Website dashboard.

Join Our Community

Raven’s GitHub Repositories welcome contributions from developers. We’ll be releasing new versions of Ravenverse and related libraries on a regular basis.

Join our discord server to get updates on what comes next

Join us on Telegram

--

--