Private ML with Tensorflow privacy

Fabiana Clemente
YData
Published in
6 min readJun 1, 2020

Building your first privacy-preserving model with TF-Privacy

Training on private data while your model has to be deployed in the public domain? Dealing with the slight possibility that your model might memorize some of the specifics in the training data and you might end up publicizing things which were supposed to be private?

This is where privacy-preserving in Machine Learning kicks in. Modern machine learning is increasingly applied to create new technologies and user experiences, many of which involve training machines to learn responsibly from sensitive data, such as personal photos, email, or text conversations. Ideally, the parameters of trained machine-learning models should encode general patterns rather than memorizing specific training examples. Nevertheless, there’s always the risk of exposing unwanted information.

To ensure strong privacy guarantees when training with sensitive data, there are already available some interesting methods, based on the theory of differential privacy or even encrypted learning. Today we will be focusing on Differential Privacy and Tensorflow Privacy.

Differential privacy (DP) is a framework for measuring the privacy guarantees provided by an algorithm. Through the lens of differential privacy, we can design machine learning algorithms that responsibly train models on private data. Learning with differential privacy provides provable guarantees of privacy, mitigating the risk of exposing sensitive training data in machine learning. Intuitively, a model trained with differential privacy should not be affected by any single training example, or a small set of training examples, in its data set. This mitigates the risk of exposing sensitive training data in ML.

The basic idea of this approach, called differentially private stochastic gradient descent (DP-SGD), is to modify the gradients used in stochastic gradient descent (SGD), which lies at the core of almost all deep learning algorithms. Models trained with DP-SGD provide provable differential privacy guarantees for their input data. There are two modifications made to the vanilla SGD algorithm:

  1. First, the sensitivity of each gradient needs to be bounded. In other words, we need to limit how much each individual training point sampled in a minibatch can influence gradient computations, and the resulting updates applied to model parameters. This can be done by clipping each gradient computed on each training point.
  2. Random Noiseis sampled and added to the clipped gradients to make it statistically impossible to know whether or not a particular data point was included in the training dataset by comparing the updates SGD applies when it operates with or without this particular data point in the training dataset.

In this blog post, we’ll see how to implement differential privacy on a machine learning model in TF 2.0 on a tabular dataset (like Iris).

Importing and Installing Dependencies

Before importing the modules, make sure you install TensorFlow-privacy on your machine which is an implementation of differential-privacy norms by TensorFlow using the following pip command.

pip install tensorflow_privacy

The above script will import the dataset which we’ll be working on. Last two lines in the script will One-Hot-decode the labels.

Define and tune learning model hyperparameters

DP-SGD has three privacy-specific hyperparameters and one existing hyperparameter that you must tune:

  1. l2_norm_clip (float) - The maximum Euclidean (L2) norm of each gradient is applied to update model parameters. This hyperparameter is used to bound the optimizer's sensitivity to individual training points.
  2. noise_multiplier (float) - The amount of noise sampled and added to gradients during training. Generally, more noise results in better privacy (often, but not necessarily, at the expense of lower utility). There sure is a tradeoff between privacy and accuracy just by considering the amount of noise added.
  3. microbatches (int) - Each batch of data is split into smaller units called micro-batches. By default, each micro-batch should contain a single training example. This allows us to clip gradients on a per-example basis rather than after they have been averaged across the minibatch. This, in turn, decreases the (negative) effect of clipping on signal found in the gradient and typically maximizes utility. However, computational overhead can be reduced by increasing the size of micro-batches to include more than one training example. The average gradient across these multiple training examples is then clipped. The total number of examples consumed in a batch, i.e., one step of gradient descent, remains the same. The number of micro-batches should evenly divide the batch size.
  4. learning_rate (float) - This hyperparameter already exists in vanilla SGD. The higher the learning rate, the more each update matters. If the updates are noisy (such as when the additive noise is large compared to the clipping threshold), a low learning rate may help the training procedure converge.

Build the learning model

For defining model we’ll use tf.kerasAPI.

Define the optimizer and loss function for the learning model. Compute the loss as a vector of losses per-example rather than as the mean over a minibatch to support gradient manipulation over each training point.

Compile and train the learning model

That’s it! you’ve successfully implemented differential-privacy on your neural network using TensorFlow. One question that comes to mind after doing this is, how to evaluate the privacy of the model? In other words, how can we make sure that our model suits the privacy needs of our product? Let’s see.

Measuring the privacy guarantee

In this section, we’ll perform a privacy analysis to measure the differential-privacy (DP) guarantee achieved by a training algorithm. Knowing the level of privacy achieved enables the objective comparison of two training runs to determine which of the two is more privacy-preserving. In other words, the privacy analysis measures how much a potential adversary can improve their guess about properties of any individual training point by observing the outcome of our training procedure (e.g., model updates and parameters).

This guarantee is sometimes referred to as the privacy budget. A lower privacy budget bounds more tightly an adversary’s ability to improve their guess. This ensures a stronger privacy guarantee. Intuitively, this is because it is harder for a single training point to affect the outcome of learning: for instance, the information contained in the training point cannot be memorized by the ML algorithm and the privacy of the individual who contributed this training point to the dataset is preserved.

In this blog, the privacy analysis is performed in the framework of Rényi Differential Privacy (RDP), you can read more about it here.

Two metrics are used to express the DP guarantee of an ML algorithm:

  • Delta (𝛿) — Bounds the probability of the privacy guarantee not holding. A rule of thumb is to set it to be less than the inverse of the size of the training dataset. In this blog-post, it is set to 10^-2 as the Iris dataset has 120 (150 total) training points.
  • Epsilon (𝜖) — This is the privacy budget. It measures the strength of the privacy guarantee by bounding how much the probability of particular model output can vary by including (or excluding) a single training point. A smaller value for 𝜖 implies a better privacy guarantee. However, the 𝜖 value is only an upper bound and a large value could still mean good privacy in practice.

Tensorflow Privacy provides a tool, compute_dp_sgd_privacy.py, which we imported as a dependency, to compute the value of 𝜖 given a fixed value of 𝛿 and the following hyperparameters from the training process:

  • The total number of points in the training data, n.
  • The batch_size.
  • The noise_multiplier.
  • The number of epochs of training.
compute_dp_sgd_privacy.compute_dp_sgd_privacy(n=120, batch_size=10, noise_multiplier=1.3, epochs=15, delta=1e-2)

Conclusion

In this blog, we learned about differential privacy (DP) and how we can implement DP principles in existing ML algorithms to provide privacy guarantees for training data. In particular, we learned how to:

  • Wrap existing optimizers (e.g., SGD, Adam) into their differentially private counterparts using TensorFlow Privacy
  • Tune hyperparameters introduced by differentially private machine learning
  • Measure the privacy guarantee provided using analysis tools included in TensorFlow Privacy

You’ve reached the end! For final notes I just want to say, In this rising age of AI and Machine Learning, we’re bound to use the AI-assisted models on daily-basis. To ensure the privacy of each individual, introducing a privacy-framework to bind with our current development of ML-models will not only make future development more secure but our personal data more private.

Fabiana Clemente is Chief Data Officer at YData.

Making data available with privacy by design.

YData helps data science teams deliver ML models, simplifying data acquisition, so data scientists can focus their time on things that matter.

--

--

Fabiana Clemente
YData
Editor for

Passionate for data. Thriving for the development of data privacy solutions while unlocking new data sources for data scientists at @YData