Privacy-Preserving Training/Inference of Neural Networks, Part 2

This is Part 2 of a series of 3 posts. Part 1 is here, and Part 3 is here.

Daniel Escudero
The Sugar Beet: Applied MPC
19 min readJul 5, 2020

--

It’s been almost half a year since the first post. First it was bureaucracy and then it was COVID, but now I’m back on track! In this second post I describe some of the technologies in the field of privacy-preserving machine learning, or specifically, private training/inference of neural networks. It is a short overview of some of the approaches. Please note: Some sections contain technical protocol details. The target audience of this post are product managers and software developers who plan to implement a secure machine learning approach, and need to evaluate the different trade-offs re speed and security.

Efficiency should be taken with a grain of salt. The goal of presenting concrete numbers is to provide an overview of what to expect from each work under the specific setting under consideration. The reader is encouraged to review the presented approaches in more depth, to understand completely what these numbers represent. Relevant references are provided where possible.

The field of PPML received a lot of attention during the last decade due to the increasingly ubiquitous presence of ML in our day-to-day lives. However, just like the term Machine Learning represents a broad set of tools and techniques for solving many different problems in a wide variety of scenarios, the concept of PPML is on its own at least as general as ML, and, in fact, it introduces an additional dimension due to the large amount of techniques that can be applied to securely solve a specific ML problem.

In spite of the above, one can still categorize works in PPML by the problem they study. We first distinguish secure training and secure inference.

Training a Neural Network

It is safe to say that neural networks (NNs) and convolutional neural networks (CNNs) are among the most widely studied ML models, and this status naturally carries over to the PPML domain. A common paradigm to achieve such task is the feed-forward and backward-propagation approach, in which the weights of the neural network are constantly updated on a backward-propagation phase, based on the results of evaluating the current weights on the training data, which is done in the feed-forward phase.

What training a neural network securely means is not completely obvious. In general, one would like to ensure the privacy of the data used for training, which can be held by different entities to begin with. One may think of the data used for training as a matrix, with the rows being the different records and the columns corresponding to the different attributes/features. This data may be horizontally partitioned, meaning that each of the data owners have some rows of this matrix, or vertically partitioned, meaning that each data owner has some columns of this dataset. One can also consider a mixture of these.

Most of the techniques in the upcoming sections are set in the client-server model. Data owners share their information to some computation servers, who will perform training on this data whilst revealing only the final model, assuming that certain security assumption holds (e.g. no more than certain threshold of the servers is passively/actively corrupt). One of the main advantages is scalability: an arbitrary number of clients can provide input to a fixed amount of servers. However, the main downside of this approach is its trust assumption: clients must be willing to believe that the security assumption holds for the servers: that no more than a certain threshold of them are corrupt (either passively or actively depending on the underlying protocol).

It is also important to note that there are many other completely different approaches for secure training. For example, a common technique is secure aggregation (see for example [1]), in which the training algorithm is not executed securely as a whole, but instead only the phases that involve the training data are computed with the help of the data owners, thus reducing the communication and computation complexity of the algorithm. A bit more concretely, instead of requiring each individual record from the training set, training algorithms typically require certain weighted sum of these. With this observation in mind, one may consider then some small secure computation protocol for this specific step that guarantees that only this weighted sum is leaked (hence the term secure aggregation).

Finally, I would like to remark, just like with many other MPC-based solutions, that most of the techniques we present here protect the intermediate computation from being leaked, but they are not concerned with what the output itself (in this case, the model) leaks about the inputs. In particular, it is important to pay attention to the so-called model-inversion attacks, which aim at finding information about the data used for training from the model itself.

SecureML ([2] S&P 2017)

This is one of the earliest and most popular frameworks for training and evaluating neural networks securely. The authors consider the client-server model with two servers, who receive shared data from multiple clients that will be used for secure training. This approach considers a semi-honest adversary that may corrupt one of the two servers.

From the paper:

“We also implement the first privacy preserving protocols for logistic regression and neural networks training with high efficiency. For example, on a dataset of size 60000 with 784 features, our privacy preserving logistic regression has a total running time of 29s while our privacy-preserving protocol for training a neural network with 3 layers and 266 neurons runs in 21000s. Our protocols are naturally divided into a data-independent offline phase and a much faster online phase. When excluding the offline phase, the protocols are even more competitive with plaintext training. For instance, for a dataset with 60000 samples and 784 features, and in the LAN setting, the linear regression protocol runs in 1.4s, the logistic regression in 8.9s, and the neural network training in 653.0s.”

ABY³ ([3] CCS 2018)

This a three-party framework for training and evaluating neural networks, as well a linear and logistic regression models. The main novelty of this work is that it uses conversions between different technical settings (Arithmetic over Z_{2^k}, the ring of integers modulo 2^k, Binary and Yao’s garbled circuit shares in the setting of 3 parties -hence the name, ABY³)/ The framework also includes many efficient building blocks for secure comparison and secure truncation, which are essential for handling machine learning models securely. Their protocols operate in the semi-honest setting, that is, were no party is assumed to deviate from the protocol, although they can be extended to the malicious scenario in which parties can deviate arbitrarily from the protocol specification without hurting security. According to the paper:

“Our protocol particularly stands out when working with neural networks. The first network we consider (NN) is for the MNIST dataset and contains three fully connected layers consisting of 128, 128, and 10 nodes respectively. Between each layer, the ReLU activation function is applied using our piecewise polynomial technique. When training the NN network, our implementation is capable of processing 10 training iterations per seconds, with each iteration using a batch size of 32 examples. Proportionally, when using a batch size of 128, our protocol performs 2.5 iterations per second. As such, an accuracy of 94% can be achieved in 45 minutes (15 epochs). Compared to [SecureML], which achieves the same accuracy, our online running time is 80x faster while the overall running time is 55000x faster.”

This work also considers convolutional neural networks:

“We also consider a convolutional neural net (CNN) with 2 hidden layers … This network applies a convolutional layer which maps the 784 input pixels to a vector of 980 features. Two fully connected layers with 100 and 10 nodes are performed with the ReLU activation function … For ease of implementation, we overestimate the running time by replacing the convolutional kernel with a fully connected layer. Our protocol can process 6 training iterations per second with a batch size of 32, or 2 iterations per second with a batch size of 128. We estimate, if the convolutional layer was fully implemented, that our training algorithm would achieve an equivalent accuracy as a plaintext model … of 99% in less than one hour of training time.”

SecureNN ([4] PETS 2019)

This work considers three and four party protocols for securely training and evaluating a neural network, with information-theoretic security against a passively corrupt adversary. SecureNN introduces many novel building blocks to deal with non-linear activations such as ReLU, which are frequent in the ML domain, and which are expensive in MPC protocols based on secret-sharing as these are better suited for arithmetic computations involving additions and multiplications. The authors also show how to perform efficiently expensive operations such as matrix multiplication.

Concrete timings per the paper:

“… we implement training on 3 neural networks: (A) a DNN with three layers from SecureML, (B) A CNN from MiniONN and (C ) A 4-layer LeNet network.
We train all the networks on the MNIST data-set. The overall execution time of our MPC protocol for Network A in the 3-server model over a LAN network is under an hour (roughly 52.8 minutes), while cleartext evaluation takes around 3 minutes on a similar machine. This illustrates that the cost of secure computation for neural network training can be within 20x of the no-security baseline. For our largest CNN network (Network C) with 99.15% accuracy, our protocols in the 3-server case executes in roughly 42.51 hours, while the corresponding cleartext execution time is 2.11 hours. We show how to modify our protocols in the 4-party setting to provide even better performance. For example, our 3-layer network has an execution time of 46.4 minutes and the 4-layer LeNet network has an execution time of 27.5 hours in the LAN setting.”

Quotient ([5] CCS 2019)

This paper presents “a new method for discretized training of DNNs, along with a customized secure two-party protocol for it’’. The authors introduce QUOTIENT as a novel way to train DNNs taking into account “techniques necessary for modern DNNs training such as normalization & adaptive gradient methods, instead relying on vanilla SGD with constant step size’’.

It makes use of a mix between additive secret sharing for boolean and arithmetic values, and garbled circuits. It considers two parties with passive security and leverages the quantization scheme from Wu et al. [6] in order to simplify training in MPC without significant cost in accuracy.

The authors run experiments in both the LAN and WAN settings. I invite the reader to inspect section 5.2 in their paper to get a complete idea of their performance results. One important snippet:

“We report accuracy and timings at 1, 5 and 10 epochs. We note that the training time grows roughly linearly with the number of epochs. In all cases except the largest MNIST model, 10 epochs finish in under 12 days. Note that standard large deep models can easily take this long to train.

Our training protocols port nicely to the WAN settings. On average, our networks are only about 5x slower over WAN than over LAN. This is is due to the low communication load and round-trip of our protocols.”

As mentioned above, the authors get running times that are in the order of days, depending on the complexity of the neural network. We remark that these results are for neural networks for simple learning tasks such as MNIST.

Evaluating a Neural Network

Now I explore approaches that aim at solving the problem of evaluating a neural network whilst preserving the privacy of both the input and the model.¹ This setting is very natural. For example, imagine a company that has made a considerable investment training a model, with the goal of profiting from it by running, say, Machine Learning as a Service (MLaaS). The company would not want its model to be leaked as it constitutes a valuable asset, so clients send their inputs to the company for inference. However, depending on the application, these inputs may be sensitive and clients may not be willing to give away their data. The techniques presented below aim at resolving precisely this issue.

I would like to remark that, like in our previous section on secure training, one must be careful about what the output may leak about the model, or the input. For example, so called model stealing attacks aim at recovering a model (or an equivalent one) from black-box access to inference queries (that is, inference without looking at the internals of the model, which is precisely what the methods below provide).

Finally, it is important to notice that many of the techniques below would benefit substantially if the model could be public, leading to much more efficient figures. This setting may seem unnatural initially given that, if the model is public, then why not let the clients perform evaluation on their own? One may argue that, if the model is too complex, then clients may want to outsource their evaluation. However, in this case, if a model is too complex to run in the clear (even in resource-constrained devices) then it is natural to expect that the secure evaluation of such model would be impractical. On the other hand, a more convincing, natural scenario is one where the data on which the prediction is performed is already secret-shared, or “encrypted’’. In this case there is no party who could run the evaluation in the clear. A good example of this is some service that keeps e-mails in secret-shared or encrypted form, but it still wishes to use an ML model for, say, spam detection.

CryptoNets ([7] ICML 2016)

CryptoNets uses Leveled Homomorphic Encryption (LHE) to perform neural network inference of encrypted data. In a nutshell, the input owner, or client, encrypts its input towards a model owner, or server, using an appropriate LHE scheme. Then, the server performs evaluation of the model homomorphically. Quoting the paper:

“We show that when CryptoNets are applied to the MNIST dataset, an accuracy of 99% can be achieved with a throughput of 58982 predictions per hour on a single PC, and a latency of 250 seconds. Note that a single prediction takes 250 seconds to complete, however, the same process can make 4096 predictions simultaneously with no added cost. Therefore, over an hour, our implementation can make 58982 predictions on average.”

MiniONN ([8] CCS 2017)

This work introduces techniques for “transforming an existing neural network to an oblivious neural network supporting privacy-preserving predictions with reasonable efficiency’’, having the crucial feature that no modifications to how the models are trained is required (which is not the case with e.g. SecureML). The paper considers passive security with two parties, where one owns the model and the other owns the input (unlike the client-server scenario). Security is achieved via additive secret sharing, preprocessing (dot-product) triples using homomorphic encryption techniques coupled with SIMD operations. The authors also use garbled circuit techniques to handle non-linear activation functions.

Their concrete results show inference of MNIST models in 3.58s of offline phase and 5.74s of online phase. Secure inference of more complex models having around 7 layers for CIFAR-10 take 472s in the offline phase and 72s in the online phase.

Chameleon ([9] AsiaCCS 2018)

This framework allows for secure inference of neural networks with three parties, assuming at most one of them is passively corrupt. This is achieved by mixing garbled circuits with additive secret sharing, together with a primitive known as oblivious transfer to deal with multiplications. The numbers reported range from 2.7s in the LAN model for a small MNIST model, to 7.8s in the WAN scenario. This work also evaluates SVMs.

DeepSecure ([10] DAC 2018)

DeepSecure is fully based on garbled circuits. It considers a 2-party setting with semi-honest security where the party who owns the model plays the garbler, and the party who owns the input plays the evaluator. The running times for the online phase of this protocol vary depending on the neural network. For example, it takes 24.37s for LeNet MNIST’s network. See Section 4.5 in the paper for details.

Gazelle ([11] USENIX 2018)

This work combines homomorphic encryption with garbled circuits in order to securely evaluate a neural network. This work has a much lower communication complexity than MiniONN:

“On a neural network trained on the CIFAR-10 dataset, the most efficient of these three protocols [MiniONN, SecureML and DeepSecure], namely MiniONN, has an online bandwidth cost of 6.2GB whereas GAZELLE has an online bandwidth cost of 0.3GB. In fact, we observe across the board a reduction of 20–80x in the online bandwidth per inference which gets better as the networks grow in size. In the LAN setting, this translates to an end-to end latency of 3.6s versus the 72s for MiniONN.”

The setting considered in Gazelle is the two-party scenario where one party owns the model and the other party owns the input. This work assumes that at most one party is passively corrupt, i.e. the adversary adheres to the specifications of the protocol. For the concrete runtimes of this approach we refer the reader to the GAZELLE paper, Section VIII. We can observe that relatively small models for the MNIST dataset can be run securely in the LAN setting with a runtime that ranges between 0.03s and 0.8s, depending on the size of the model. For prediction on the CIFAR-10 task, the running time is about 12.9s.

XONN ([12] USENIX 2019)

This work leverages the concept of Binarized Neural Networks (NNs whose weights are +/- 1) in order to boost the efficiency of garbled-circuit-based protocols for secure inference. These networks rely heavily on the XNOR operation, which, fortunately can be done almost “for free’’ in garbled circuits. When comparing to previous work:

“We perform extensive evaluations on different datasets. Compared to the Gazelle (the prior best solution) and MiniONN frameworks, we achieve 7x and 93x lower inference latency, respectively. XONN outperforms DeepSecure (prior best GC-based framework) by 60x and CryptoNets, an HE-based framework, by 1859x. Moreover, our solution renders a constant round of interactions between the client and the server, which has a significant effect on the performance on oblivious inference in Internet settings.”

CrypTFlow ([13] 2019)

This work, done by researchers at Microsoft Research India, aims at performing secure evaluation of neural networks with as much ease as possible, removing the necessity from most previous works of manually re-implementing ML models so that they fit the protocol’s specifications. To this end, CrypTFlow develops a compiler from Tensorflow to certain secure computation engines, which allows the user to benchmark, in principle, any model trained in Tensorflow. This work also aims at evaluating large networks, unlike previous works which typically studied smaller toy networks for simple tasks as MNIST or CIFAR-10.

Another distinguishing point is that CrypTFlow proposes actively secure counterparts of their 2 and 3-party protocols. However, this is done by introducing hardware assumptions (such as intel SGX), which results in high efficiency when compared to more “pure crypto’’ approaches, but also results in a stronger assumption.²

To achieve their results, the authors consider two passively secure protocols for the 2 and 3-party settings (ABY [14] for the 2-party case and an improved version of SecureNN [4] for the 3-party case). As we already hinted above, the 2-party case with active security is obtained by introducing hardware assumptions.

The benchmarks the authors run include large networks for image classification: ResNet50, DenseNet121 and SqueezeNet (I refer the reader to section VI in the paper for details on these networks). This work reports, for the LAN and 3-party setting, a running time of 36s for the most complex network (DenseNet121) with passive security, and 112.9s for active security using Intel SGX.

Garbled Neural Networks ([15] 2019)

This work builds on top of the garbling scheme of Ball, Malkin & Rosulek (ACM CCS 2016), which is set in the arithmetic setting (unlike most garbled circuits techniques that work in the boolean scenario), presenting many optimizations along the way. This constitutes the only work that is fully based on arithmetic garbling, which implies that it is constant-round and therefore very suitable for WAN settings. The authors evaluate their approach on multiple neural networks for the MNIST and CIFAR-10 tasks, and we refer the reader to section 8 in their paper for details on their performance results.

Quantized Neural Networks ([16] PETS 2020)

I would like to include a work I did with Marcel Keller (DATA61) and Anders Dalskov (Aarhus University). Our work considers the evaluation of neural networks that are quantized under the scheme of Jacob et al. [17]. Such models can be produced by TensorFlow Lite (https://www.tensorflow.org/lite), which is an extension of Tensorflow to produce smaller models which, as identified in our work, results in more “MPC-friendly’’ models.

Unlike most of the works I have discussed so far, we evaluate much more complex neural networks. More specifically, we focus on evaluating the networks in the MobileNets family [18], which are networks of 28 hidden layers with 1000 outputs, and are used to classify images. For comparison, most of the models for the MNIST and CIFAR-10 tasks considered in other works have about 5 layers, at most.

On top of supporting “off-the-shelf’’ ML models (i.e. these that can be trained with TensorFlow and then converted to their Lite version), our work considers four different security settings: 2 parties (dishonest majority) with passive and active security, and 3 parties (honest majority) with passive and active security. Naturally, our results vary depending on the scenario, and I refer the reader to section IV in the paper for details (specifically table II). Our most efficient results are obtained in the honest majority scenario:

“The most efficient inference can be performed using a passive honest majority protocol which takes between 0.2 and 3.5 seconds, depending on the size of the model; for active security and an honest majority, inference is possible between 5.3 and 76.9 seconds.”

On the other hand, the dishonest majority setting is considerably more inefficient, with timings that vary between 2 and 50 minutes for passive security depending on the model, and active security can get runtimes that are as large as 7.5 h! As a result, the “security cost’’ to pay by allowing an extra third party to achieve honest majority may be the best approach when it comes to efficiency.

Delphi ([19] USENIX 2020)

This is a recent work by researchers at UC Berkeley that aim at improving the state of the art for secure inference involving only two parties, assuming semi-honest security. This work achieves outstanding performance in the online phase, mostly due to a combination of two clever observations: (1) A given model is likely to be evaluated on multiple different inputs, which allows any computation that only depends on the model to be considered preprocessing, which in turn enables a very fast online phase for linear layers, and (2) one can replace many expensive non-linear activation functions by more MPC-friendly ones, such as squaring, without compromising accuracy by much.

In order to determine the optimal way to “transform” the given model into one that is more amenable to MPC computation, the authors present a planner that “automatically discovers which ReLUs to replace with quadratic approximations so as to maximize the number of approximations used while still ensuring that accuracy remains above a specified threshold”.

The concrete performance for the online phase in Delphi is significantly better than that of previous works:

…using DELPHI for cryptographic prediction on ResNet-32 requires just 3.8 seconds and 60MB communication in the online phase, improving upon GAZELLE by 22× and 9× respectively.

Other Resources

Another survey of techniques in the PPML domain can be found in [20]. The readers can also check the posts from Dropout Labs.

In this post I tried to cover many of the relevant works in the area of privacy-preserving machine learning. I feel a bit guilty, not being able to provide a much more exhaustive list, and also not being able to give more details about the internals of at least some of the protocols I discussed. I sincerely apologize to the authors of the works I have left out.

In the next and final post I will discuss some of the companies and projects that are taking privacy-preserving machine learning from research to practice. This is an exciting field, and many results are reaching the boundaries of practicality, up to the point in which products can start being considered for deployment in certain specific scenarios. See you then!

Footnotes

  1. Most of the works below actually leak the architecture of the model (e.g. number and type of layers, as well as their dimensions), given that this information is usually needed to be able to carry on the computation. Only purely HE-based approaches avoid this leakage, since in this case the evaluation of the model is done at the model owner’s end.
  2. For example, some companies in Europe or China may not be willing to rely on Intel’s SGX for the security of their models, being Intel an American company.

References

  • [1] Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H. B., Patel, S., … & Seth, K. (2017, October). Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (pp. 1175–1191).
  • [2] Mohassel, P., & Zhang, Y. (2017, May). Secureml: A system for scalable privacy-preserving machine learning. In 2017 IEEE Symposium on Security and Privacy (SP) (pp. 19–38). IEEE.
  • [3] Mohassel, P., & Rindal, P. (2018, January). ABY3: A mixed protocol framework for machine learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (pp. 35–52).
  • [4] Wagh, S., Gupta, D., & Chandran, N. (2018). SecureNN: Efficient and Private Neural Network Training. IACR Cryptology ePrint Archive, 2018, 442.
  • [5] Agrawal, N., Shahin Shamsabadi, A., Kusner, M. J., & Gascón, A. (2019, November). QUOTIENT: two-party secure neural network training and prediction. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (pp. 1231–1247).
  • [6] Wu, S., Li, G., Chen, F., & Shi, L. (2018). Training and inference with integers in deep neural networks. arXiv preprint arXiv:1802.04680.
  • [7] Gilad-Bachrach, R., Dowlin, N., Laine, K., Lauter, K., Naehrig, M., & Wernsing, J. (2016, June). Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy. In International Conference on Machine Learning (pp. 201–210).
  • [8] Liu, J., Juuti, M., Lu, Y., & Asokan, N. (2017, October). Oblivious neural network predictions via minionn transformations. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (pp. 619–631).
  • [9] Riazi, M. S., Weinert, C., Tkachenko, O., Songhori, E. M., Schneider, T., & Koushanfar, F. (2018, May). Chameleon: A hybrid secure computation framework for machine learning applications. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security (pp. 707–721).
  • [10] Rouhani, B. D., Riazi, M. S., & Koushanfar, F. (2018, June). Deepsecure: Scalable provably-secure deep learning. In Proceedings of the 55th Annual Design Automation Conference (pp. 1–6).
  • [11] Juvekar, C., Vaikuntanathan, V., & Chandrakasan, A. (2018). {GAZELLE}: A low latency framework for secure neural network inference. In 27th {USENIX} Security Symposium ({USENIX} Security 18) (pp. 1651–1669).
  • [12] Riazi, M. S., Samragh, M., Chen, H., Laine, K., Lauter, K., & Koushanfar, F. (2019). {XONN}: XNOR-based Oblivious Deep Neural Network Inference. In 28th {USENIX} Security Symposium ({USENIX} Security 19) (pp. 1501–1518).
  • [13] Kumar, N., Rathee, M., Chandran, N., Gupta, D., Rastogi, A., & Sharma, R. (2019). Cryptflow: Secure tensorflow inference. arXiv preprint arXiv:1909.07814.
  • [14] Demmler, D., Schneider, T., & Zohner, M. (2015, February). ABY-A framework for efficient mixed-protocol secure two-party computation. In NDSS.
  • [15] Ball, M., Carmer, B., Malkin, T., Rosulek, M., & Schimanski, N. (2019). Garbled Neural Networks are Practical. IACR Cryptology ePrint Archive, 2019, 338.
  • [16] Dalskov, A., Escudero, D., & Keller, M. (2019). Secure evaluation of quantized neural networks. arXiv preprint arXiv:1910.12435.
  • [17] Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., … & Kalenichenko, D. (2018). Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2704–2713).
  • [18] Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., … & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.
  • [19] Mishra, P., Lehmkuhl, R., Srinivasan, A., Zheng, W., & Popa, R. A. (2020). DELPHI: A cryptographic inference service for neural networks. In 29th USENIX Security Symposium (USENIX Security 20).
  • [20] Tanuwidjaja, H. C., Choi, R., & Kim, K. (2019, September). A Survey on Deep Learning Techniques for Privacy-Preserving. In International Conference on Machine Learning for Cyber Security (pp. 29–46). Springer, Cham.

--

--