Meet OpenFL: The Federated Learning Framework Reinventing Project Power and Security

Ezequiel Lanza
Intel Tech
Published in
6 min readAug 16, 2023

Get access to more data without sending yours anywhere.

Photo by Dayne Topkin on Unsplash

Author: Presented by Ezequiel Lanza

Artificial intelligence applications are changing the world as we know it, but they require mountains of data to learn. More data isn’t necessarily better. Data also needs to be diverse. Otherwise your model may exacerbate bias, resulting in AI-based financial services and healthcare AI systems that discriminate against race, gender, or marital status when making important decisions. Moreover, if your data is stored in multiple sites or even multiple countries, it may be too big or too costly to send to other sites, and local regulations may prevent you from sharing sensitive data, such as patient information.

Federated learning is an AI framework that can help organizations solve these problems. By enabling companies to collaboratively train models without sending data to a centralized site, federated learning is helping institutions with sensitive data improve the accuracy of their AI models by securely tapping into more data.

At this year’s Toronto Machine Learning Summit* (TMLS), Intel Open Source Evangelist Ezequiel Lanza shared an overview of federated learning and introduced Open Federated Learning (OpenFL), a framework for federated learning originally developed by Intel. Watch the full talk here.

How Federated Learning Works

Traditional machine learning trains models using data in one location. For instance, data set owners with three separate institutions must send their data from each institution to an aggregation site where the model is trained.

In a standard machine learning approach, all data must be sent to the model, then trained.

In a federated learning approach, the data never leaves the institution. Separate models are trained in their own locations and sent to an aggregation server, which synthesizes the models into one new model. The new model is then sent back to each institution so it can be applied to the data.

Federated learning enables models to distribute training across multiple devices and locations.

Testing Federated Learning

In a study, The University of Pennsylvania and Intel launched the largest medical federation to date, called The Federated Tumor Segmentation (FeTS) initiative. Using confidential data from MRI scans from 71 healthcare institutions across six continents, the FeTS medical imaging model was trained to identify brain tumors.

FeTS securely connected 71 sites across six continents.

Compared with models trained using local data, the federated model increased accuracy by up to 33 percent. The test helps demonstrate that access to more-diverse data sets improves AI accuracy and that even large sites can benefit from collaboration. Read more about the project.

The FeTS project proved federated learning (“full federation consensus”) generates improved accuracy more than models trained using local data (“public initial model”).

OpenFL: An Open Source Library for Federated Learning

FeTS ran on a platform called Open Federated Learning (OpenFL), an open source, Python* 3 framework for federated learning. Though Intel originally developed OpenFL for FeTS, OpenFL became a Linux Foundation* project in March 2023 and is now a use case‒agnostic framework that can be used across industries.

OpenFL offers advantages over other available frameworks because it was built around security. For instance, transport layer security (TLS) implementation is embedded, enabling secure environments without having to add any new configurations. Additionally, OpenFL is compatible with multiple training frameworks, such as Keras*, TensorFlow*, and PyTorch*, even enabling you to connect models from institutions that have been trained using different training frameworks.

Because OpenFL is an open source framework, you can take advantage of the resources in the GitHub* library, download it as a container from Docker* Hub, and easily install it from PyPl*.

· Visit the GitHub repo

· Go to the Docker Hub

Extra Security Where You Need it Most

While federated learning protects data sets from having to travel across networks, the models must travel from institutions to the aggregation server, leaving them exposed to several types of attacks. The most common type of attack on federated learning models is poisoning attacks. If someone intercepts a model in transit, they can poison the model by altering the weights, or even get access to the initial data set by extracting information about the way a model applies weight. Intellectual property (IP) theft is also a primary concern; if an attacker steals a model from one site, they’re effectively stealing the full federated model and all associated training information.

On top of the built-in security features of OpenFL, like TLS implementation, Intel® Software Guard Extensions (Intel® SGX) adds an additional layer of security to prevent attackers from stealing the model or extracting training data from it. While encryption helps protect data when it’s stored and in transit, data is vulnerable the moment an application is being processed by the hardware. Intel SGX enables confidential computing, which creates a more secure memory enclave between the hardware and the application to ensure that only verified applications can access the code.

Let’s look at an example of OpenFL architecture. The aggregator creates a plan, which includes instructions about how the model should weight the data, and shares the plan with each institution, or node. Models are protected by TLS and certificate authority (CA) on each node. For users without deep security experience, OpenFL includes a vanilla configuration of CA you can easily download and use.

Who Uses OpenFL?

Here are a few examples of how companies are using OpenFL.

· Montefiore Health System* used OpenFL to simultaneously tap data from multiple hospitals to predict the likelihood of acute respiratory distress syndrome (ARDS) and death in COVID-19 patients.

· VMware* used OpenFL for microservice applications and contributed EDEN[NB1] , a new compression pipeline designed for federated learning, to OpenFL.

· Launched in 2021, FeTS Segmentation Challenge is the first federated learning competition, focusing on the task of brain tumor segmentation.

Use Case: A Real-Life Space Federation

In an increasingly connected world where sharing data is becoming faster and easier, you might think it can’t be difficult to share data between sites. However, when Frontier Development Lab (FDL)* wanted to study the effects of cosmic radiation on astronauts, they needed to share data across multiple institutions around the world and in space. Though each institution had the right to use the data, the data was private, and transmitting the data to a spacecraft was costly. Using OpenFL, FDL was able to train the model using data from NASA* and Mayo Clinic* inside the spacecraft without having to send the data to earth. Read more here.

Get Involved

There are many ways to get involved with OpenFL, such as trying tutorials, reading blog posts that explain how to train a model using OpenFL, and checking out online documentation that can help you launch your first federation.

If you’re already an expert, we encourage you to contribute to the community by solving issues or writing a blog post. You can also join our monthly virtual community meetings in your region. You’ll find all the info in the GitHub* repo.

About the Presenter

Ezequiel Lanza, Open Source Evangelist. Passionate about helping people discover the exciting world of artificial intelligence, Ezequiel is a frequent AI conference presenter and the creator of use cases, tutorials, and guides that help developers adopt open source AI tools like TensorFlow* and Hugging Face*. Find him on Twitter at @eze_lanza

--

--

Ezequiel Lanza
Intel Tech

AI open source evangelist at Intel . Passionate about helping people discover the exciting world of artificial intelligence.