Meet OpenFL: The Federated Learning Framework Reinventing Project Power and Security
Get access to more data without sending yours anywhere.
Author: Presented by Ezequiel Lanza
Artificial intelligence applications are changing the world as we know it, but they require mountains of data to learn. More data isn’t necessarily better. Data also needs to be diverse. Otherwise your model may exacerbate bias, resulting in AI-based financial services and healthcare AI systems that discriminate against race, gender, or marital status when making important decisions. Moreover, if your data is stored in multiple sites or even multiple countries, it may be too big or too costly to send to other sites, and local regulations may prevent you from sharing sensitive data, such as patient information.
Federated learning is an AI framework that can help organizations solve these problems. By enabling companies to collaboratively train models without sending data to a centralized site, federated learning is helping institutions with sensitive data improve the accuracy of their AI models by securely tapping into more data.
At this year’s Toronto Machine Learning Summit* (TMLS), Intel Open Source Evangelist Ezequiel Lanza shared an overview of federated learning and introduced Open Federated Learning (OpenFL), a framework for federated learning originally developed by Intel. Watch the full talk here.
How Federated Learning Works
Traditional machine learning trains models using data in one location. For instance, data set owners with three separate institutions must send their data from each institution to an aggregation site where the model is trained.
In a federated learning approach, the data never leaves the institution. Separate models are trained in their own locations and sent to an aggregation server, which synthesizes the models into one new model. The new model is then sent back to each institution so it can be applied to the data.
Testing Federated Learning
In a study, The University of Pennsylvania and Intel launched the largest medical federation to date, called The Federated Tumor Segmentation (FeTS) initiative. Using confidential data from MRI scans from 71 healthcare institutions across six continents, the FeTS medical imaging model was trained to identify brain tumors.
Compared with models trained using local data, the federated model increased accuracy by up to 33 percent. The test helps demonstrate that access to more-diverse data sets improves AI accuracy and that even large sites can benefit from collaboration. Read more about the project.
OpenFL: An Open Source Library for Federated Learning
FeTS ran on a platform called Open Federated Learning (OpenFL), an open source, Python* 3 framework for federated learning. Though Intel originally developed OpenFL for FeTS, OpenFL became a Linux Foundation* project in March 2023 and is now a use case‒agnostic framework that can be used across industries.
OpenFL offers advantages over other available frameworks because it was built around security. For instance, transport layer security (TLS) implementation is embedded, enabling secure environments without having to add any new configurations. Additionally, OpenFL is compatible with multiple training frameworks, such as Keras*, TensorFlow*, and PyTorch*, even enabling you to connect models from institutions that have been trained using different training frameworks.
Because OpenFL is an open source framework, you can take advantage of the resources in the GitHub* library, download it as a container from Docker* Hub, and easily install it from PyPl*.
Extra Security Where You Need it Most
While federated learning protects data sets from having to travel across networks, the models must travel from institutions to the aggregation server, leaving them exposed to several types of attacks. The most common type of attack on federated learning models is poisoning attacks. If someone intercepts a model in transit, they can poison the model by altering the weights, or even get access to the initial data set by extracting information about the way a model applies weight. Intellectual property (IP) theft is also a primary concern; if an attacker steals a model from one site, they’re effectively stealing the full federated model and all associated training information.
On top of the built-in security features of OpenFL, like TLS implementation, Intel® Software Guard Extensions (Intel® SGX) adds an additional layer of security to prevent attackers from stealing the model or extracting training data from it. While encryption helps protect data when it’s stored and in transit, data is vulnerable the moment an application is being processed by the hardware. Intel SGX enables confidential computing, which creates a more secure memory enclave between the hardware and the application to ensure that only verified applications can access the code.
Let’s look at an example of OpenFL architecture. The aggregator creates a plan, which includes instructions about how the model should weight the data, and shares the plan with each institution, or node. Models are protected by TLS and certificate authority (CA) on each node. For users without deep security experience, OpenFL includes a vanilla configuration of CA you can easily download and use.
Who Uses OpenFL?
Here are a few examples of how companies are using OpenFL.
· Montefiore Health System* used OpenFL to simultaneously tap data from multiple hospitals to predict the likelihood of acute respiratory distress syndrome (ARDS) and death in COVID-19 patients.
· VMware* used OpenFL for microservice applications and contributed EDEN[NB1] , a new compression pipeline designed for federated learning, to OpenFL.
· Launched in 2021, FeTS Segmentation Challenge is the first federated learning competition, focusing on the task of brain tumor segmentation.
Use Case: A Real-Life Space Federation
In an increasingly connected world where sharing data is becoming faster and easier, you might think it can’t be difficult to share data between sites. However, when Frontier Development Lab (FDL)* wanted to study the effects of cosmic radiation on astronauts, they needed to share data across multiple institutions around the world and in space. Though each institution had the right to use the data, the data was private, and transmitting the data to a spacecraft was costly. Using OpenFL, FDL was able to train the model using data from NASA* and Mayo Clinic* inside the spacecraft without having to send the data to earth. Read more here.
Get Involved
There are many ways to get involved with OpenFL, such as trying tutorials, reading blog posts that explain how to train a model using OpenFL, and checking out online documentation that can help you launch your first federation.
If you’re already an expert, we encourage you to contribute to the community by solving issues or writing a blog post. You can also join our monthly virtual community meetings in your region. You’ll find all the info in the GitHub* repo.
About the Presenter
Ezequiel Lanza, Open Source Evangelist. Passionate about helping people discover the exciting world of artificial intelligence, Ezequiel is a frequent AI conference presenter and the creator of use cases, tutorials, and guides that help developers adopt open source AI tools like TensorFlow* and Hugging Face*. Find him on Twitter at @eze_lanza