A Path Towards Secure Federated Learning

Bring Hardened Enclaves to your Federated workflow with OpenFL 1.3

Mansi Sharma
Openfl
6 min readApr 13, 2022

--

Open Federated Learning (OpenFL) is a deep learning framework agnostic library for federated learning developed at Intel® that lets developers train statistical models on sharded datasets, distributed across several nodes (if you are new to OpenFL, refer to the OpenFL medium article). With the release of OpenFL 1.3, we incorporated a lot of exciting features such as flexible task assignment in the interactive API, new support and examples for Huggingface transformers, Pytorch Lightning, MXNet and Numpy, and new aggregation algorithms like FedCurv, FedYogi and FedAdam. But today we focus on a new dimension for our framework: bringing together hardware and software for privacy preserving AI using Intel® Software Guard Extensions (Intel® SGX) and Gramine.

OpenFL was created to address the challenge of maintaining data privacy while bringing together insights from many disparate, confidential or regulated datasets. However, training a model this way introduces new challenges around IP and how it gets used. As a data owner for example, how do I know that a model created by someone else isn’t revealing something about my data distribution? Or as a model author, how do I know that my model isn’t being copied by someone else? One option is trust established at a personal level, where all participants have a preexisting relationship that is vetted ahead of time. While this can work for a few participants, as a federation grows these trust relationships become harder to manage (because each participant needs to trust everyone else). Another way to build trust is computationally through software and/or hardware. In software, the OpenFL team has encouraged the use of mTLS since our initial release to maintain the privacy of in-flight data along with the use of secure aggregation algorithms. Today, with the release of OpenFL 1.3, we take our first step towards hardware level security with an example of OpenFL running within an Intel® SGX enclave, and discuss how we intend to build on this example to enable end-to-end secure federated learning.

Before going further, let’s talk a bit about Trusted Execution Environments (TEEs, also called hardened enclaves) and why they’re necessary. Normally, when a process is launched, the entirety of that program resides in unencrypted memory. If a local attacker knows what they are looking for, it is possible to dump the memory contents and extract this information. Intel® SGX is one such example of a TEE that offers hardware-based memory encryption that isolates specific application code and data in memory and enforces access to it with hardware. It allows user-level code to allocate private regions of memory, called enclaves, which are designed to be protected from processes running at higher privilege levels. At runtime, Intel® SGX instructions build and execute the enclave into a special encrypted memory region with restricted entry/exit location defined by the developer which helps prevent data leakage. The enclave data written to memory is encrypted and its integrity checked, helping provide some assurance that no unauthorized access or memory snooping of the enclave occurs.

Within hardened enclaves, applications are isolated from the host and memory is completely isolated from the operating systems, hypervisor and BIOS and anything else on that machine.

Intel® SGX has been available in Intel® Xeon® CPUs since 2018 and two exciting recent developments motivate the use of it in OpenFL today. The first is the release of the 3rd Generation of Intel® Xeon® Scalable processors, which feature large enclaves (up to 512 GB enclave per processor). The second is a new open-source, confidential compute package: Gramine LibOS. Since developing applications for Intel® SGX enclaves might be complex, we need a solution that lets developers run their applications without any modifications. Enter Gramine Library OS or simply Gramine: a lightweight library OS that lets developers run applications in Intel® SGX seamlessly (more on Gramine here). Taken together, with minimal configuration users can for the first time use OpenFL to train large deep learning models entirely within encrypted memory.

Threat model of SGX: SGX helps protect applications from three types of attacks: in-process attacks from outside of the enclave, attacks from OS or hypervisor, and attacks from off-chip hardware. Image by authors: Chia-Che Tsai, Stony Brook University; Donald E. Porter, University of North Carolina at Chapel Hill and Fortanix; Mona Vij, Intel® Corporation

Gramine SGX architecture: Gramine starts with an untrusted Platform Adaption Layer (pal-sgx), which calls the Intel® SGX drivers to initialize the enclave. An important component of loading the enclave is to specify a manifest file that specifies the attributes and loadable binaries in the enclave. After enclave initialization, the loader continues loading additional libraries, the SHA-256 hash of which are checked against the manifest first before loading them into the enclave.

Now we have the perfect recipe for running deep learning training workloads with OpenFL in encrypted memory with Intel® SGX without any modifications using Gramine.

For a user-friendly flow that requires minimal installation, we pack all prerequisites in a docker file and run the application in a docker container. To get acquainted with the flow, run an OpenFL-Gramine example or simply utilize the OpenFL integration test script on a machine enabled for Intel® SGX. The data scientist creates an OpenFL experiment workspace and initializes a plan. The next step is to build and save the experimental docker image containing all the required files to start an experiment inside an enclave through the help of the ‘fx’ command below.

fx workspace graminize -s $KEY_LOCATION/key.pem

This is a newly added command in OpenFL 1.3 and contrasts with the previous ‘fx workspace dockerize’ command because it packs the required Gramine dependencies and measures the content of the workspace before packing the OpenFL workspace into a docker image. The newly built OpenFL-Gramine image can be distributed to the aggregator and collaborator machines to start a federation experiment. These machines import the docker image and exchange PKI certificates before running the image by invoking Intel® SGX enclaves inside their respective docker containers.

As excited as we are to take the lid off the OpenFL Gramine example today, a key capability needed to mitigate model extraction threats in a production environment called Remote Attestation is still out of scope of the OpenFL 1.3 release, which we plan to address in future releases of OpenFL. Remote attestation allows a remote process in an Intel® SGX enclave to prove it’s running the code it should be running and allows the integrity of the application to be trusted even when the user of the system is not. Additional confidential compute infrastructure allows the enclave to be attested and verified, giving a remote process the means to differentiate whether they are communicating with the enclave versus a potentially compromised process run by a malicious user.

In conclusion, running deep learning applications on a secure and trustworthy framework has become more critical than ever. Using OpenFL helps institutions across the globe to collaborate and run their models in a federated manner while better protecting sensitive information with the help of Intel® SGX and Gramine.

Are you interested in the intersection of Federated Learning and security? Join the OpenFL community by getting in touch on Slack or star us on Github to stay in loop on exciting updates coming soon! Checkout our tutorial that will be run in conjunction with the MICCAI conference!

Thanks to Patrick Foley for significantly contributing to the article!

Notices & Disclaimers

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex​​.

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available ​updates. See backup for configuration details. No product or component can be absolutely secure.

Your costs and results may vary.

Intel technologies may require enabled hardware, software or service activation.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. ​

--

--

Mansi Sharma
Openfl
Editor for

AI Frameworks Engineer in life sciences at Intel