Privacy and Machine Learning

Published in

dotStar

5 min readJun 21, 2019

In today’s world of technology, it can be undoubtedly presumed that data rules supreme! And with fields like Deep learning having a conspicuous presence in almost every industry, acquiring the right data, (both quality and quantity wise) becomes even more important for organisations. With this comes the paramount task of defining coherent policies related to privacy (Remember GDPR!)

In this post we will explore certain aspects of privacy in Deep Learning and making use of PySyft : A python library for secure deep learning, we will demonstrate a simple code to serve a model for making secure and private predictions.

Privacy in Deep Learning

A traditional Deep Learning pipeline follows the standard drill: Accumulate data on a server; Train the model on the server ; Make inferences using this model and Repeat. This may, however, set off alarm bells amongst all the data privacy crusaders especially if the data being sent is private or sensitive (like financial information or healthcare records)

PySyft is one such python library which uses a few principles like Differential Privacy, Federated Learning and Secure Multiparty Private Computation (SMPC) to ensure privacy. Let’s touch upon each briefly using an example

Federated Learning

In analogy, FL is like ordering dinner instead going to the restaurant yourself (what?). So, for example , if Alice refuses to send her data to train Bob’s model on a sitting cloud server citing privacy issues, then a copy of Bob’s model server is sent to her instead! This model trains on Alice’s device and the weights are sent back to the server where the model is updated accordingly. This means that Alice still owns her data and can rest assured that her data has not been disclosed to/by Bob.

There’s a flaw here though! Even if Alice has sent just the weights to Bob’s server, Bob can easily recreate Alice’s data through reverse engineering (since the weights may reveal details about the data).

In another situation ,Evil Ed got a copy of the model on Bob’s server under the pretext of training on his own data but instead used this model for his own profits. So, in essence, Bob lost ownership of the model he had so painstakingly built!

A common approach here, is to encrypt both the data and the model and then train this encrypted model on the encrypted data

Differential Privacy

Say, if Bob’s model is predicting fraud using financial data, there should be no way of knowing if a particular data sample belongs to Alice. In simple terms, Bob’s model should learn what it needs to learn (to detect fraud throug data) and nothing else (like learning about Alice’s financial characteristics through her data). That means, the model, while training should not learn or disclose user specific features

Secure Multiparty Private Computation (SMPC)

Let’s say Ed, Ned and Eddie found a treasure map and tore it up into three pieces so that no one could get hold of the entire map at once. The map would make sense only when all the three pieces were put together. Likewise in MPC, multiple parties agree to carry out a computation and the input data and model weights are divided into shares and distributed among the parties. All the shares are required to reconstruct the original data thus making a share a private key of sorts.

Serving a secure model using PySyft

PySyft currently supports secure serving for Pytorch and Keras or Tensorflow models. Let’s create and securely serve a model using keras API. (PySyft uses TfEncrypted under the hood for tensorflow models)

Note: I have used the mnist tutorial given in the PySyft github repository as a reference.

Step 1: Publicly train and save a simple CNN model for image classification on CIFAR10 dataset.

# import libraries
from tensorflow.keras.datasets import cifar10
from  tensorflow.keras.models import Sequential
from  tensorflow.keras.layers import Conv2D, MaxPooling2D
from  tensorflow.keras.layers import Dense, Flatten
from  tensorflow.keras.optimizers import SGDnum_classes = 10
input_shape = (1, 32, 32, 3)# Define model model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same', batch_input_shape=(1,32, 32, 3)))
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(num_classes, name='logit'))# We are assuming the model has already been trained and the weights # are saved in a .h5 file
# load the pretrained weightspre_trained_weights = 'cifar10.h5'
model.load_weights(pre_trained_weights)

Step 2: Import PySyft and use the KerasHook() function.

import syft as sy
hook = sy.KerasHook(tf.keras)

This will make three functions available

share: This will allow secure sharing of the model among the workers and also enable predictions on encrypted data
serve:this will start a queue that accepts prediction requests (you can put a limit on the number of requests using the num_requests parameter)
shutdown_workers : shut down the model after serving requests

Step 3: Create three workers (TFEWorker) who will launch Tensorflow servers for performing private predictions. (The AUTO flag indicates whether these instances need to be managed manually or by the workers). Since we are using the same machine all the three workers will be on ‘localhost’. This can ofcourse be changed i distributed techniques are being used

AUTO = True
worker_1 = sy.TFEWorker(host='localhost:5000', auto_managed=AUTO)
worker_2 = sy.TFEWorker(host='localhost:5001', auto_managed=AUTO)
worker_3 = sy.TFEWorker(host='localhost:5002', auto_managed=AUTO)

Step 4: Now, call the the share() which will convert the model into a keras encrypted model followed by the serve() method to serve prediction requests

model.share(worker_1, worker_2, worker_3)
model.serve(num_requests=5) # limit the number of requests to 5

The server is all set to accept requests for making predictions using an encrypted model

Make Private Predictions

Next, create a client and connect to the model using connect_to_model() function. In addition, you need to define the same three workers we created earlier.

#import librariesimport numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from keras.preprocessing.image import load_img,img_to_array
import syft as sy#create a clientclient = sy.TFEWorker()
worker_1 = sy.TFEWorker(host='localhost:5000')
worker_2 = sy.TFEWorker(host='localhost:5001')
worker_3 = sy.TFEWorker(host='localhost:5002')#connect to the secure modelclient.connect_to_model(input_shape, output_shape, worker_1, worker_2, worker_3)

Now, you can easily query the model for obtaining predictions securely.

# prepare the image for prediction
def predict(filename):
    img = load_img(filename, target_size=(32, 32))
    img = img_to_array(img)
    img = img.reshape(1, 32, 32, 3)
    img = img.astype('float32')
    img = img / 255.0
    return imgfilenames=['horse.jpg','bird.jpg','car.jpg']
actual_labels = [7,2,1]# Query the model for obtaining private predictionsfor i,filename in enumerate(filenames):
    img = predict(filename)
    res = client.query_model(img)
    print(f"predicted class for {filename}:{np.argmax(res)} and actual class : {actual_labels[i]}")

The output obtained is :

predicted class for horse.jpg:    7 and actual class : 7
predicted class for bird.jpg:    2  and actual class : 2
predicted class for car.jpg:    1  and actual class : 1

Finally, the server can be shut down by calling the model.shutdown_workers()

There! our model seems to doing just fine! And moreover, these predictions are done securely. That is, neither did the model know what the actual data was nor did the client know what the model’s weights were. The inferences were made on encrypted data using an encrypted model!

Thanks for reading