Privacy-Preserving Training/Inference of Neural Networks, Part 3

This is the final part of a series of 3 posts. Part 1 is here, and Part 2 is here.

Daniel Escudero
The Sugar Beet: Applied MPC
8 min readJul 20, 2020

--

In my previous two posts, I described the tasks of securely training and evaluating a neural network. I also discussed several academic results that achieve these tasks efficiently. We learned that performance is becoming less of an issue, as research pushes the boundaries of efficiency. The goal of this final post is to present some projects that deploy these technologies in the real world.

Before I present the list, let me clarify a few of things. First, it should not come as a surprise that several of these projects are for-profit/paid and closed source. This is OK, as at the end of the day, deploying these technologies takes money and human resources. The fact that organization show interest in investing in these technologies highlights their relevance, pushes research further, and paves the way for recognition within the public domain. Nevertheless, many of the projects below are non-profit and open source, bringing these technologies within reach of institutions and less tech savvy users who want to perform analytics while preserving privacy.

Second, I have no personal nor professional relations with any of these companies and organizations. Opinions are my own. Please do your own research when deploying these technologies in production.

Finally, although most of the projects I list below are directly related to Privacy-Preserving Training/Inference of Neural Networks, not all of them fall squarely within this category. However, all projects are related to the more general topic of Privacy-Preserving Machine Learning.

TF-Encrypted (Dropout Labs)

TF-Encrypted (TFE), from the Dropout Labs team, is “a library for doing secure computation directly in TensorFlow’’. Their code is open source, and it is licensed under Apache License V2.

One of the main goals of TFE is to make secure inference much easier to use and deploy. This is achieved by implementing a library that runs on top of Tensorflow (or Keras), leveraging the familiarity that users have with these widely-used machine learning frameworks to seamlessly execute certain multiparty computation protocols under the hood. The main insight of TFE is that Tensorflow, as a platform, already supports many of the necessary building blocks to deploy an MPC solution. For example, in Tensorflow one can define a machine to be a “worker’’, which is in charge of executing certain steps from the computation graph. The programmer can specify which parts of the graph are executed by which worker, and Tensorflow will take care of coordinating the necessary communication among these workers.

MPC is a particular type of distributed application in which coordinated communication is required among several nodes/machines. This feature is integrated in TFE to remove the burden of coordinating the nodes for the MPC programmer. In this setting the workers are the parties running the protocol, and the computation graph is modified so that a specific MPC protocol can be applied (SPDZ, SecureNN or ABY3 are the applied MPC protocols in this case). For example, the ML programmer may specify a certain graph representing a model, but this time, say, a multiplication node will not correspond to an “usual’’ multiplication and instead it will correspond to a “subgraph’’ that represents an MPC subprotocol to handle a secure multiplication.

TF-Encrypted is well regarded by the community and the contributors are very active with its codebase. The project experienced very rapid development during the last year and more is to be expected.

To sum up, some of the features present in TFE include:

  • Open-source code;
  • Secure training and inference using SPDZ and SecureNN;
  • User-friendly high-level interface that interacts with Tensorflow (and Keras) nicely.

PySyft (OpenMined)

OpenMined is “an open-source community whose goal is to make the world more privacy-preserving by lowering the barrier-to-entry to private AI technologies’’. PySyft, an open-source project that is part of OpenMined, is “a Python library for secure, private machine learning. PySyft extends PyTorch, Tensorflow, and Keras with capabilities for remote execution, federated learning, differential privacy, homomorphic encryption, and multi-party computation.”

OpenMined follows the paradigm from TF-encrypted of extending existing popular platforms such as TensorFlow and PyTorch to add an extra layer of privacy. As described in the whitepaper, PySyft makes use of MPC and Differential Privacy in a federated learning context. The MPC layer implements the SPDZ MPC protocol, with some variations that make it more asymmetric due to the nature of the federated learning scenario (in which one party is assumed to own the final model).

Overall, PySyft includes the following features:

  • Open-source code;
  • Secure training and inference using SPDZ;
  • Support on the PyTorch API.

nGraph-HE and nGraph-HE2 (Intel)

This open-source project by Intel focuses on enabling data scientists to use familiar deep learning frameworks such as Tensorflow, MXNet and PyTorch together with homomorphic encryption technologies. This is achieved by leveraging the capabilities of the Intel nGraph Compiler to transform neural networks developed in high level languages such as Tensorflow, into an intermediate representation that can be then executed homomorphically, with the help of the Simple Encrypted Arithmetic Library (SEAL) from Microsoft Research.

This project, like the ones we have described so far, aims at bridging the gap between theory and practice by allowing data scientists to use privacy-preserving techniques without drastically changing their current work-flow.
We note, however, that its goal is not to be production-ready. As stated in their readme: “This project is meant as a proof-of-concept to demonstrate the feasibility of HE on local machines. The goal is to measure performance of various HE schemes for deep learning. This is not intended to be a production-ready product, but rather a research tool.’’

nGraph-HE is the basic, original compiler that makes use of nGraph and SEAL to run inference of deep neural networks homomorphically. However, the framework itself it rather limited, as it only supports a limited class of models, which must have polynomial activations only. nGraph-HE2 on the other hand augments the capabilities of the original framework by introducing several optimizations that allows the execution of a much wider range of neural networks, including in particular the MobileNetsV2 family of networks that are suitable for multiple image-recognition tasks.

From their papers: “We evaluate our contributions on both small, single-operation tests, and on larger neural networks. In particular, we demonstrate state-of-the-art performance on the CryptoNets network, with a throughput of 1,998 images/s. Our contributions also enable the firstInpher, to our knowledge, homomorphic evaluation of a network on the ImageNet dataset, MobileNetV2, with 60.4%/82.7% top-1/top-5 accuracy and amortized runtime of 381 ms/image. This is the first work showing the privacy-preserving execution of a full production-level deep neural network.”

Inpher

Inpher is a company that works in the secure computing space. From their website: “Inpher’s Secret Computing® products enable data scientists and analysts to unlock sensitive data for their functions and machine learning models without ever exposing or transferring the underlying sensitive data in the process. Don’t choose between data privacy and data usability — — you can finally have both!”

The company’s products consist of a fully homomorphic encryption library and an MPC platform. The FHE library, TFHE, is an open-source project with appealing features and reasonable performance (details can be found in this paper). On the other hand, Inpher’s main commercial asset is Inpher’s XOR Secret Computing® Engine, which is built from the passively secure protocol described in this work. The claimed benefits of this engine are that it is commercially-ready, no trusted third-parties are needed, eliminates tradeoff between data usability and data privacy, is GDPR and sovereign data privacy compliant, has high accuracy and is quantum-safe.

Inpher describes many potential use-cases for their products, including commercial banking, investment banking and hedge funds, credit and payments, and insurance: “With the ability to guarantee data security in-use, organizations can explore new opportunities not previously possible. For example, there is no longer a need to anonymize and centralize data for analytics, which is highly susceptible to re-identification and misuse, and also reduces the available features to train your model. Keep the data where it is and take advantage of emerging cryptographic magic:

  • Conduct privacy-compliant analytics and on your organization’s sensitive data across departments, jurisdictions and regulatory bodies. Legal opinion for GDPR compliance available upon request.
  • Train Machine Learning models from multiple private data sources with all features intact to improve predictions.
  • Monetize the insight of data without giving it away.”

Aircloak

This company makes use of a patented and proven anonymization method that allows users to gather statistics from a database without leaking individual records: “Aircloak’s anonymization is based on a combination of time-tested ideas like K-Anonymity, low-count suppression, top and bottom coding, and differential privacy noise, as well as patented open concepts developed jointly by Aircloak and Max Planck Institute for Software Systems (MPI-SWS), including Sticky Layered Noise and safe SQL filtering.” See also this paper.

The approach followed by Aircloak is a dynamic anonymization techniques similar to Differential Privacy. However, the method developed by the company claims to overcome one of the issues with DP, which is the so-called privacy budget: “Aircloak’s approach has engineered away the need for a privacy budget by producing tailored pseudo-random noise values that do not average away. Repeated or semantically equivalent queries produce the same noise values, which in turn leads to the ability to ask as many queries of your dataset as you desire.”

The security of the techniques used by Aircloak is analyzed in the literature, and a bounty program was set in place to analyze the security of the framework as a whole, from which several potential attacks have been found.

Oasis Labs

Oasis Labs is a blockchain startup focusing on enabling privacy-preserving computation “on the blockchain’’. Not all of its code is open-source, and therefore, it is difficult to accurately describe the type of technologies that Oasis Labs uses. In their whitepaper it is stated that the components that Oasis Labs provide can be applied to “training machine learning models across diverse datasets without leaking information about the datasets’’, which is why I am considering them in this survey: “We use secure computing techniques — including the use of secure enclaves, multiparty computation, and zero knowledge proofs — so nodes are unable to view sensitive data when running computations and storing data.”

Leap Year

There are not much details that can be obtained from Leap Year’s website. The company presents itself as a platform for developing secure, intelligent systems that unlock value from sensitive information: “Protect confidential, regulated, and proprietary data assets with mathematically proven security. Differential privacy, the highest standard of data protection, is embedded into every computation, so developers and analysts are never exposed to sensitive information.”

In this post I did my best at listing some of the projects (for-profit and non-profit based) that are related to the topic of privacy-preserving machine learning. Of course, the list cannot be exhaustive, and I have probably left out some notable efforts! If you know any other project out there aiming at taking these technologies to practice, please add them to the comments. More than promoting them, the goal is to showcase the relevance and practicality of these methods.

--

--