In our efforts to build a flexible, easy to use interface for privacy-preserving machine learning we are happy to announce TF SEAL, a bridge between Microsoft SEAL, a state of the art homomorphic encryption (HE) library, and TensorFlow, one of the most popular machine learning frameworks.
While already useful in itself, this is also the first step in integrating HE into TF Encrypted towards our goal of building a generic interface for machine learning on encrypted data powered by different kinds of secure computation technologies.
Finally, TF SEAL serves as a useful example of how secure computation projects can be easily integrated into TensorFlow, paving the way for more integrations in the future.
The current support in TF Encrypted for computing on encrypted data is powered by secure multi-party computation (MPC) protocols, but there are pros and cons to both MPC and HE that make it highly relevant to have both. In a nutshell, MPC is an interactive way of computing on encrypted data while HE is non-interactive; the former adds a non-trivial communication overhead to computations while the latter incurs significant computational overhead.
This means that homomorphic encryption wins in three key areas. First, there is no interaction during computation. One problem with MPC is that if a user is submitting an encrypted input to a model they need to stay connected to the other parties until the computation has finished. This is not a problem with HE, the model owner can queue up a bunch of computations and asynchronously send back results to the users. Second, you don’t need to deal with collusion. If you’re running a three-party protocol two of the parties could theoretically collude to break the protocol and leak information. In HE, there is always only two parties and no way to collude. The third win comes from the two-party architecture as well. HE follows the standard client/server architecture so it’s more obvious how you could create a client prediction service.
While homomorphic encryption sounds better at first glance there are some major downsides. The encrypted numbers that HE works on are just so big that it takes a lot of CPU cycles to compute every operation. Another big downside is that some operations are really expensive or just not practical to compute with HE. While this can also be a problem with MPC it’s an even bigger problem with HE.
For a deeper look at homomorphic encryption we recommend taking a look at the Microsoft SEAL examples mentioned below, Stephen Hardy’s illustrated primer, the Cryptography Boot Camp from Simons Institute, or the current standardization process.
Microsoft SEAL is a state-of-the-art homomorphic encryption library implementing two basic HE schemes. A lot of work has gone into SEAL to make it as efficient as possible. One of the biggest contributions here is having efficient parallelization implemented through the use of batching. When an array of data is encrypted, it is encrypted into one ciphertext which is then operated on at the level of the ciphertext in a SIMD fashion, rather than by one number at a time. You can then build on this batching by packing your data efficiently inside of this ciphertext.
There have been many research papers built on top of SEAL that solve concrete problems. Here’s a sample of some of the interesting ones:
- Logistic regression over encrypted data from fully homomorphic encryption uses SEAL to attempt a task from the 2017 iDASH secure genome analysis competition.
- Labelled PSI from Fully Homomorphic Encryption with Malicious Security, uses SEAL to create private set intersection protocols.
- PIR with compressed queries and amortized computation uses SEAL to implement efficient private information retrieval, it comes with an open-source implementation as well.
Microsoft SEAL comes with heavily documented examples that teach you all the basics needed to know to use SEAL. This includes how to use the different schemes (BFV and CKKS), encoding, encrypting and decrypting numbers and suggestions on how to choose encryption parameters. We recommend checking them out here.
Bridging with TensorFlow
Integrating with TensorFlow can be done in a myriad of different ways. In Python, you can implement new layers and high-level operations using the functions exposed by the TensorFlow Python API. If you need to add lower-level operations there is a C++ API provided where you can create “custom op”. You might need to create custom operations if you need the speed provided by C++, are creating new algorithms for GPUs or if you need to integrate an external library into TensorFlow, such as SEAL.
The process of integrating SEAL into TensorFlow came with some interesting technical challenges. The first challenge encountered was the fact that SEAL was written in modern C++17 whereas TensorFlow is written in C++11 and the official releases (as of v1.14.0) are compiled with an older version of GCC (on Linux). The second challenge was how to allow SEAL objects to passed around inside TensorFlow.
In general, linking C++ code compiled with different standards should work but newer standards create problems when linking with machine code generated by older compilers which have no notion (or unstable features) from the new standard. In the case of TensorFlow official releases, they are compiled with an older GCC version (in the 4.x range). This version of GCC has no notion of C++17 so when you attempt to link Microsoft SEAL code with the TensorFlow shared object file they end up having ABI incompatibilities and will not run. See this StackOverflow question for more details on linking code together from different standards. Our current solution to this problem is to build TensorFlow from source with a newer compiler and with the C++17 standard turned on, resulting in some minor code changes.
The other challenge was to figure out how to allow memory to be managed by SEAL while still allowing TensorFlow to treat it as a regular tensor. After some investigation, we found a type of tensor called a variant tensor where you can store any C++ object inside of it. These tensors can be inputs and outputs to custom operations and moved around much like any tensor in TensorFlow. The most interesting thing about variant tensors is that they can be used as a general solution to integrating other external libraries where it’s best to have the library manage its memory.
Here’s a simple example of how variants tensors are used and then combined with SEAL to produce an “add” operation:
The interface for this basic “add” operation is quite simple, it takes inputs and the output as variants. One downside of this approach is that there is no type checking at the framework level which instead must occur inside the implementation of the operation. As we will see below, this can be somewhat mitigated when using the op in Python.
Once we had the custom operations finished in C++ we needed a convenient way to access them through Python. TensorFlow comes with a handy way to create your tensor classes and have the TensorFlow Python API treat them like regular tensors.
For example, you can pass them directly into the `session.run` call. This allows the implementation details to be hidden slightly from the user and you can create convenience functions such as `__add__` on the custom tensor class. This neat feature is achieved by creating conversion functions to convert between regular TensorFlow tensors and the custom tensors.
Logistic Regression Example
The next step to proving out the first version of TF SEAL was to create a logistic regression prediction example. The example starts with a simple matrix multiplication where the input is encrypted sensitive data and the “weights” are unencrypted. Then we complete the example with an approximation of the nonlinearity sigmoid. The arithmetic involved in a sigmoid is not directly possible in homomorphic encryption we had to approximate it here. MPC does a similar approximation to compute nonlinearities.
You can find the full example on GitHub.
About Dropout Labs
We’re a team of machine learning engineers, software engineers, and cryptographers spread across the United States, France, and Canada. We’re working on secure computation to enable training, validation, and prediction over encrypted data. We see a near future where individuals and organizations will maintain control over their data, while still benefiting from cloud-based machine intelligence.
If you’re passionate about data privacy and AI, we’d love to hear from you.