Enhancing the Tools for Differential Privacy in Google’s TensorFlow
In a world where the risks and costs associated with privacy are on the rise, and privacy issues are leading to broader questions around trust and AI, we believe that differential privacy offers a solution. Differential privacy allows machine learning teams to create products without risking the privacy of any individual’s data.
With the right tools, we think that companies of all sizes can take advantage of advanced machine learning techniques such as differential privacy. Google’s initial release of TensorFlow Privacy — an open-source library that makes it easier for developers and researchers to train machine-learning models — made this a reality for deep learning.
We’re excited to announce that we have made our differential privacy software product available through TensorFlow Privacy (GitHub) in collaboration with Google. Our contribution to Tensorflow Privacy adds support for two common machine learning techniques — Logistic Regression and linear Support Vector Machines.
Now, developers from companies of all sizes can access these libraries and start using them today.
What is differential privacy
Differential privacy is a mathematical definition of privacy loss when private information is used to create a data product. Specifically, differential privacy measures how effective a particular privacy technique — such as inserting random noise into a dataset — is at protecting the privacy of individual data records within that dataset.
Solutions that satisfy differential privacy inject noise into a dataset, or into the output of a machine learning algorithm, without introducing significant adverse effects on data analysis or model performance. They achieve this by calibrating the noise level to the sensitivity of the algorithm.
The result is a differentially private dataset or model that cannot be reverse-engineered by an attacker. This makes it impossible to identify individual records with certainty, e.g., customers, patients, within a dataset.
For example, Apple uses differential privacy to analyze the typing patterns of iPhone users without being able to identify any individual customer. In this example, macro-level questions about text patterns — such as predicting which word is most likely to be used next for predictive typing — can be answered without compromising the individual privacy of the people contributing the data.
Software and Differential Privacy at Georgian Partners
At Georgian Partners, we’re building software to help our companies further accelerate their adoption of disruptive technologies. Our software is directly tied to applied research in our investment thesis areas — Trust, Applied AI and Conversational AI. Each software tool we develop is designed to help push breakthrough technology to our companies.
We developed our differential privacy software product in collaboration with some of our companies. They faced a challenge in that even though they had varying degrees of freedom to use their customers’ data; they hesitated to aggregate data across customers.
Without aggregating data, it can be challenging to deliver value to new customers of machine learning products because they have not yet amassed enough data. This is called the cold-start problem. You can solve this, improve on-boarding times and reduce time-to-value for new customers by using aggregate data and machine learning models from existing customers. However, to convince customers to share data or models, you need to guarantee their information privacy and gain their trust. This is where differential privacy helps by guaranteeing privacy to earn trust.
Our Approach to Differential Privacy
There are several ways to approach differential privacy, depending on the nature of the problem and availability of data. For our cold-start use cases, we have limited labeled data available. Also, we wanted to support some of the more common machine learning algorithms first. After exploring different methods, we identified the Bolt-on method as the best approach for our use cases.
The Bolt-on algorithm is easy to implement, and it relies on stochastic gradient descent (SGD), a generic optimization technique that can be applied to various convex optimization-based machine learning techniques such as Logistic Regression and linear Support Vector Machines (SVM).
In the Bolt-on algorithm, SGD is treated as a black box, and noise is only added to the machine learning algorithm output at the end of the optimization process. In our research, we observed that just a small amount of noise is needed to achieve reasonable privacy guarantees.
Getting started with TensorFlow Privacy
If you’re working on a similar cold-start use case, check out the examples and tutorials in the TensorFlow Privacy GitHub repository.
To learn more about differential privacy, you can read our CEO’s Guide to Differential Privacy. Finally, to learn how privacy is integral to building and leveraging customer trust, read the CEO’s Guide to Trust.