Infrastructure, FL workflow

Federated Learning with Fed-BioMed: Infrastructure and Workflow

A short presentation on how Fed-BioMed works

Yannick Bouillard @ Fed-BioMed
Fed-BioMed

--

Federated learning is a promising recent technique in machine learning that allows several parties (like hospitals!) to get involved for training a statistical model while preserving their privacy. As you can imagine, it is quite suitable for healthcare, where data are very sensitive and need to be protected. This article focuses on Fed-BioMed, a powerful Federated Learning framework oriented to healthcare applications. For an introduction to the Fed-BioMed framework and to federated learning, please check this medium article. In what follows, I will present the Fed-BioMed architecture and its main components, as well as its workflow.

Fed-BioMed architecture

A federated learning framework typically involves 2 main components: one Server and several Clients — also called Nodes. The topology used is quite similar to classical client-server architectures (see picture below), where a server centralizes the information and communicates with one or several clients, which request information from server. In federated learning, things are not that different: the Clients hold the data that is needed for training a statistical model whereas the server takes care of aggregating each local model provided by each client, into a global model.

Client — Server architecture in computer network theory. A central Server serves some contents that can be accessed over a network. Clients may send Requests to Server in order to access to this content, which responds in return to Clients. Centralized Federated Learning uses this client — server architecture. Images from Those Icons & Freepik — Flaticon.

Nodes, Network, and Researcher

Fed-BioMed, replicates this into a production ready yet simple and user-friendly framework, with three main components:

  • Nodes, the entities which hold the data (under strong privacy constraints);
  • Researcher, the orchestrator, in charge of setting and monitoring the training of a model in a federated learning fashion, selecting the appropriate strategy (i.e. how to sample Nodes), as well as aggregating local models into a global consensus model;
  • Network, an entity that connects a Researcher to several Nodes. In Fed-BioMed, Network has several means of communication (depending on the size of the data / messages) in order to exchange messages and models from Nodes to Researcher and vice versa.
Fed-BioMed architecture example containing a Researcher (in charge of the Experiment), three Nodes holding clinical data, on which they train their local models, and a Network in the middle enabling message and models exchanges between Nodes and Researcher. Image from Fed-BioMed.

We are going to see how these 3 components interact with each other when involved in a federated learning model training in the next section.

A brief overview of Fed-BioMed workflow and its illustration through an example

To explain the Fed-BioMed workflow (and make it less boring!), I will illustrate it through an example involving clinical trials.

As a warning, please note this example is made for the sake of illustration and understanding, may be oversimplified and does not reflect at all the reality and difficulty of conducting a clinical trial as well as getting a drug approval.

Let us consider a clinical trial which will be performed in several hospitals to test the efficacy of a drug under development. In this context, as a Researcher and the leader of such clinical trial, you want to train a logistic regression model, from which you could study if there is any correlation between the administration of the drug and the improvement of the patients’ health condition.

To do so, you are going to designate in each hospital a clinician that will lead the clinical trial locally and collect patient data.

Clinical trial from drug discovery to an approved treatment. Clinical trials typically involves 2 arms (an arm is a group of patients receiving a specific treatment or a placebo), from which one can study the efficacy of a treatment. Icons from Vitaly Gorbachev — Falticon, and inspired from mrctcenter.

Once the clinical trial in each hospital is done, you may be first tempted to gather all the data in a single place: but then you realize that patient records must be kept confidential and can not leave the hospital where the data have been collected, for the sake of patient data privacy, or for some other legal or practical reasons. Thus, as an alternative, you can train your model using the federated learning framework Fed-BioMed, suitable for clinical datasets, in order to ensure patient data privacy.

1. Load data into nodes

The first step for each clinician would be to load his/her clinical dataset (patient records) into his/her respective Node. Fed-BioMed comes with tools to load healthcare datasets, which can be used to load data and associate them to a specific tag (or set of tags). In Fed-BioMed, tags are a very convenient way for the Researcher to select a dataset from a Node and train a model on it.

Scheme presenting how datasets are loaded into a Node in Fed-bioMed. After loading a medical dataset, clinician will be asked to specify one or several tags and a dataset description, which can then be accessed by the Researcher in order to ease the selection of appropriate datasets for training a model. Image from Fed-BioMed.

2. Create your Training Plan and your Experiment

2.1. Training Plans

Now that datasets are loaded on Nodes and thus reachable, the second step, performed by the Researcher, would be to create a Training Plan. A Training Plan basically describes how the model will be trained on each Node: it contains the model architecture (the logistic regression model in our example) but also the instruction for loading and preparing data before passing them into the model. Do you want to select only some of the features provided by the clinicians? Parse missing data? Do some extra computation, filtering on features? No problem, Fed-BioMed can handle all that through the Training Plan!

2.2. Experiment setup

In Fed-BioMed, an Experiment is an object that orchestrates the overall federated learning pipeline, and needs to be created before running your federated learning training. It takes not only a Training Plan as input, but also the Aggregation method (Federated Averaging, FedProx, Scaffold, …) and the Strategy (how to sample nodes, how to react when nodes are disconnecting, …). Once the Experiment has been created, the federated model training can start!

Main elements needed to configure an Experiment: the TrainingPlan, the Aggregator, and the Strategy. Researcher (you) has to specify those in order to run your Experiment. Image from Fed-BioMed.

3. Run the federated training

Now everything is ready for the federated training! What will happen is the following iterative process:

  1. The global model is sent to the Nodes through the Network. The model’s architecture is defined in a Training Plan, and weights are contained in a specific file exchanged over the Network;
  2. Each Node trains the model on the available local data;
  3. The resulting optimized local models are sent back to the Researcher through the Network;
  4. The shared local models are aggregated to form a new aggregated global model, through the Aggregator.

Steps 1 to 4 correspond in Fed-BioMed to a Round. Rounds are performed several times till the global model converges. Once the global model has reached convergence after several iterations, you will be able to get your logistic regression model, and check if the use of your drug has an effect on patient health (by extracting the odd ratios for instance), and this without ever accessing a single patient’s record !

Congrats, results of the logistic model has shown that your new drug tremendously improved patients health condition! You are just one step closer to get the approbation from local authority! Photo by Christina Victoria Crafton on Unsplash.

Conclusion

Through this example, you now know the basics of conducting a federated learning experiment with Fed-BioMed framework! Besides clinical trials, Fed-BioMed can also be used for a range of healthcare use cases. For further information on Fed-BioMed architecture and to Fed-BioMed in general, you may want to visit our website.

There is so much to discover about Fed-BioMed! Stay tuned!

--

--