Practical Tools for Privacy Preserving Deep Learning (PPDL)

Praguna Manvi
4 min readJun 11, 2022

--

Deep learning is a data intensive learning technique applied with great success in many domains. However, practical problem-solving in Medical, Banking, and other sensitive information fields demands privacy. PPDL is a set of approaches that address this issue for Deep Learning. Some seminal works from the cryptography land are borrowed and applied in AI.

Privacy Preserving Deep Learning

An ideal PPDL system preserves data and model privacy, accuracy, and latency. Methods like Homomorphic Encryption allows operations on encrypted data sent from a client but have very high complexity while building deeper models and results can be decrypted only at the client-side. Federated learning is already highly used in modern-day applications such as Gboard, sharing encrypted model update information with the service provider after local inference on the client device. Differential privacy prevents the leakage of sensitive information which the model might have memorized by using random noise while training. Secure Enclaves provide hardware-based security for code and data from other software on the platform.

Homomorphic Encryption[1]

Recently, SMC (Secure Multi-party Computation) has gained a lot of attention which operates on shared data (among any number of clients) without revealing data or model weights during computation. It provides security against adversaries with unbounded computational power. SMCs offer the capacity to train a model on combined data. Unlike previous methods above, there is no need to maintain private keys to work. Severe communication overhead is the main challenge for the SMC approach on deeper models though it guarantees accuracy.

Secure multiparty computation[1]

Crypten built on PyTorch implements SMC. It provides an imperative API and we can easily convert torch tensors to crypten tensors.

!pip install crypten
!pip install torch==1.9.0
import torch
import crypten
crypten.init()
x = torch.tensor([1.0, 2.0, 3.0])
x_enc = crypten.cryptensor(x) # encrypt
x_dec = x_enc.get_plain_text() # decrypt
y_enc = crypten.cryptensor([2.0, 3.0, 4.0])
sum_xy = x_enc + y_enc # add encrypted tensors
sum_xy_dec = sum_xy.get_plain_text() # decrypt sum
print(sum_xy_dec)
Output:
tensor([3., 5., 7.])

Further examples on training and inference are present here.

Tensorflow graph compiled library tf-encrypted also provides learning on encrypted data. An example of calculating the average over five inputs is shown below.

!pip install tf-encryptedimport logging
import sys
import tensorflow as tf
import tf_encrypted as tfe
@tfe.local_computation(name_scope="provide_input")
def provide_input() -> tf.Tensor:
# pick random tensor to be averaged
return tf.random_normal(shape=(10,))
@tfe.local_computation("result-receiver", name_scope="receive_output")
def receive_output(average: tf.Tensor) -> tf.Operation:
# simply print average
return tf.print("Average:", average)
if __name__ == "__main__":logging.basicConfig(level=logging.DEBUG)# get input from inputters as private values
inputs = [
provide_input(
player_name="inputter-0"
), # pylint: disable=unexpected-keyword-arg
provide_input(
player_name="inputter-1"
), # pylint: disable=unexpected-keyword-arg
provide_input(
player_name="inputter-2"
), # pylint: disable=unexpected-keyword-arg
provide_input(
player_name="inputter-3"
), # pylint: disable=unexpected-keyword-arg
provide_input(
player_name="inputter-4"
), # pylint: disable=unexpected-keyword-arg
]
# sum all inputs and divide by count
result = tfe.add_n(inputs) / len(inputs)
# send result to receiver
result_op = receive_output(result)
# run a few times
with tfe.Session() as sess:
sess.run(result_op, tag="average")
Output:
Average: [0.45496113397347965 0.62231367169638774 -0.23807346441091298 ... -0.88690748361530258 0.22709952751105014 0.22085048010973937]

Refer for further tutorials on using this library here.

Although a few prototyping libraries exist today, training simple models like Lenet-5 on MNIST takes days, and inference runs into many hours. We can anticipate hybrid approaches using both Homomorphic encryption and SMC, parallelization, new types of neural nets (Efficient for PPDL), and unique hardware designs in the future.

Thanks for going through, Have a nice Day !

References:

  • H. C. Tanuwidjaja, R. Choi, S. Baek and K. Kim, “Privacy-Preserving Deep Learning on Machine Learning as a Service — a Comprehensive Survey,” in IEEE Access, vol. 8, pp. 167425–167447, 2020, doi: 10.1109/ACCESS.2020.3023084.
  • Knott, B., Venkataraman, S., Hannun, A., Sengupta, S., Ibrahim, M., & Maaten, L. (2021). CrypTen: Secure Multi-Party Computation Meets Machine Learning. In Advances in Neural Information Processing Systems (pp. 4961–4973). Curran Associates, Inc..
  • Kumar, N., Rathee, M., Chandran, N., Gupta, D., Rastogi, A., & Sharma, R. (2020). CrypTFlow: Secure TensorFlow Inference. In IEEE Symposium on Security and Privacy. IEEE.
  • https://github.com/tf-encrypted/tf-encrypted
  • https://github.com/facebookresearch/CrypTen

--

--