Announcing Support for Federated Analytics in Raven Distribution Framework (RDF)

Unnikrishnan Menon
RavenProtocol
Published in
5 min readFeb 15, 2022

Federated Analytics is the latest feature added to Raven Distribution Framework that allows for the safe dynamic aggregation of statistics such as mean, variance, and standard deviation across data that is privately held on several clients. RDF’s Ravop library now supports the creation of federated operations which developers can leverage to conduct analysis without directly observing a client’s private data.

What is Federated Analytics?

Federated analytics is a new approach to data analysis in which key statistics like mean, variance, and standard deviation can be calculated across various private datasets without compromising privacy. It operates similarly to federated learning in that it runs local calculations over each client device’s data and only makes the aggregated findings — never any data from a specific device — available to developers. Sensitive data like medical records, financial transactions, employee data, and others can be analyzed without leaving the premise. Refer to the below links to learn more:

How does RDF facilitate Federated Analytics?

The latest RDF version (v0.3) contains an updated Ravop library that supports the creation of federated operations. These operations are inherently different from the traditional distributed computation ops that were already supported. While in distributed computing, the op along with its requisite data is sent to the contributing client, federated ops utilize data that is locally hosted on a client’s system.

Developers can utilize Ravop to define a set of “rules” that the client’s data must abide by in order to filter out data that is irrelevant to the developer’s requirement. RDF now seamlessly integrates with a new Python client (Ravpy) which calculates the required statistics locally without exposing its private data. The client’s dataset is first validated according to the “rules” defined by the developer, following which, a data silo with the appropriate columns containing values in the defined range is prepared. At this stage, the client locally evaluates the required statistics on its data silo.

Before transmitting their local stats for aggregation, the clients have the option to add a layer of homomorphic encryption for enhanced security.

These local stats are then uploaded onto Raven’s FTP Server (Ravftp). The central server Ravsock then fetches and aggregates them with values received from the other participating clients. The maximum number of participating clients in a federated analytics graph is declared within the “rules” set by the developer. Once enough clients have participated, the aggregated results are returned to the developer.

Usage

1. Configure RDF

Make sure RDF is configured correctly and Ravsock server is up and running. Refer to the following article for more details:

2. Developer Side

Create a federated analytics graph by providing its name, approach, and rules to which clients must adhere.

import ravop as Rgraph = R.Graph(name="Office Data", approach="federated",
rules=json.dumps(
{"rules": {"age": {"min": 18, "max": 80},
"salary": {"min": 1, "max": 5},
"bonus": {"min": 0, "max": 10},
"fund": {}
},
"max_clients": 3}))
  • Name: The name for the graph set by the developer. Preferably a meaningful name that allows clients to identify the type of dataset desired by the developer.
  • Approach: Set to ‘federated’.
  • Rules: The rules dictionary must contain the names of all the columns of data required by the developer for aggregation and their corresponding constraints as shown above. The clients will then be able to filter their data accordingly. Note: An empty dictionary for a column signifies no constraints. All values in that column shall be considered.
  • Max Clients: The number of clients whose data must be aggregated and returned to the developer.

Creation of Federated Ops

The following code snippet creates and adds ops to the previously declared graph.

mean = R.federated_mean() 
variance = R.federated_variance()
standard_deviation = R.federated_standard_deviation()

The results of aggregation can be fetched by calling the aforementioned ops.

# Wait for the results
print("\nAggregated Mean: ", mean())
print("\nAggregated Variance: ", variance())
print("\nAggregated Standard Deviation: ", standard_deviation())

The results will be ready once max_clients number of clients have participated.

Note: The proper way of wrapping up ops in a graph is by calling graph.end() at the end of the code. This checks for any failed ops and lets the developer know.

Sample Test Code

python federated_test.py

3. Client Side

As of now, Federated Analytics is natively supported by Raven’s Python Clients (Ravpy).

Upon configuration, RDF ensures that Ravpy gets properly installed.

For a client to view the available pending graphs and their corresponding data rules:

python run_client.py --action list

The client must note the graph_id for the graph in which it wants to participate.

For the client to participate in its desired graph:

python run_client.py --action participate --cid 123 --federated_id <graph_id>

Note: The cid argument is a unique username provided by the client.

The terminal will then prompt the client to provide the path for its dataset.

The data can be placed inside /ravpy/data/ folder. The data must be a .csv file containing at least all columns mentioned in the graph's rules in any order.

Conclusion

You can now set up RDF and test our new secure federated analytics feature on your custom datasets. Developers are welcome to contribute to Raven’s GitHub repositories. We will be actively releasing new versions of RDF and its libraries.

Join our discord server to get updates on what comes next

Join us on Telegram

--

--