We’re pleased to introduce Dropout Labs, a company focused on secure, privacy-preserving machine learning.
- Access to sensitive data will create better AI, but should not compromise privacy.
- Owner-control over data will help align AI with the values of individuals and society.
- Shared-ownership of data and AI will enable new business models.
Today, data privacy and artificial intelligence are at odds because of the massive amount of data required to train a model and the sensitivity surrounding data. This poses a challenge for organizations that are using AI to extract the most value from their data while complying with data privacy regulations and for consumers that want intelligent products and services but have concerns about how organizations use their data.
In the future, our most useful apps will rely on our most sensitive data.
Our most sensitive data is the most valuable to AI. Medical records, financial statements, location history, and voice transcripts can all be processed by AI to provide services that improve our lives. Unfortunately, our sensitive data can also be used against us. Naturally, we draw the line somewhere between privacy and utility when it comes to sharing our data with third-parties.
It’s no secret that the apps we use collect our data. This is the hidden cost of many of the free apps and services we use. But why is this the case? It isn’t simply because of ad-based business models processing our data for targeting. It’s because of a limitation in encryption technology.
The State of Encryption
Encryption technology is used widely today to secure our apps in all but one important area.
- Secure networking, like HTTPS, is used on most of the world’s top websites and apps. It protects our information as it travels across the internet.
- Secure storage is used by our phones, laptops, and cloud services. It protects our saved data, like photos and passwords.
- Secure computation, however, is still emerging. It can protect our data when it is being processed by apps and services.
Unfortunately, the products we use today rely on plaintext (unencrypted) access to our data in order to process it. While our data is encrypted in-transit and encrypted at-rest, it is unencrypted in-use. Secure computation hasn’t progressed enough to be used at scale. This is the weakest link in the data privacy model of the internet today.
Imagine if we could use the products we love without having to worry about our personal data. We could extract utility from our sensitive data, improving our lives and society, while maintaining ownership over our information. We need to improve secure computation in order for people and organizations to have this level of control over their privacy.
What is secure computation and what problems can it solve?
The Millionaires’ Problem
In the 1980s, Andrew Yao introduced a problem: How can two people determine who is richer without revealing their net worth and without using a trusted third-party? Yao provided the first solution himself, by developing a protocol for secure multi-party computation, or simply, secure computation.
Performance has historically been the main hold up preventing the adoption of secure computation. In other words, it was too slow to be practical. After many years of research, there are now aspects of secure computation that are feasible. Certain operations can be computed securely, and with practical performance. Given these constraints, is there a killer application of secure computation today?
Secure Machine Learning
It turns out, the math that secure computation can do efficiently is the same math that is commonly used in machine learning. Between improvements in secure computation protocols over the last 30 years and advancements in neural networks, we decided to put together a team of cryptographers and machine learning engineers to work on secure, privacy-preserving machine learning.
Our co-founder, Morten Dahl, started a series of blog posts exploring the convergence between deep learning and secure computation. Since then, we’ve turned our focus to making it easier for cryptography and machine learning researchers to develop new secure machine learning protocols, and for data scientists to introduce privacy into their workflow.
We decided to develop a community-driven open source framework for experimenting with private machine learning on-top of TensorFlow, called tf-encrypted.
tf-encrypted enables training, validation, and prediction over encrypted data. Data remains encrypted during the entire data science workflow. With secure prediction, machine learning models can be hosted in the cloud, without decrypting the inputs or outputs of the query. This means users can benefit from cloud-based machine intelligence while protecting the privacy of their data.
For more technical details please see our paper, Private Machine Learning in TensorFlow using Secure Computation.
We believe privacy-preserving technology will help AI responsibly transform verticals like healthcare, finance, and transportation by managing the complexities of data privacy.
How Does Secure Machine Learning Work?
Let’s say we want to predict the person whose face appears in a photo. This is a common task in machine learning. Using secure machine learning, we can encrypt both the model and the query.
We take the image, encrypt it, and send it to the server. The server runs the prediction securely, meaning everything stays in an encrypted state. So while the server learns nothing, it still produces a result. Only the end user can decrypt the prediction.
In the early days of the web, HTTPS was only used for sensitive things like payments and banking.
There are personal, competitive, and regulatory borders that sit between data and intelligence. The most valuable data is locked up today for these reasons. Secure machine learning can enable access to data while complying with these borders. In other words, secure machine learning preserves the privacy of sensitive data. What are some examples?
- Personally, I haven’t taken advantage of genetic testing yet because I’m uncomfortable with the risk of my data being misused. Imagine if my genetic profile and any resulting analysis were only visible to me? If it were mathematically proven that my data remained secret during the entire process, the pros would outweigh the cons. I could take advantage of the incredible progress geneticists have made, without the risk of a privacy breach.
- Hospitals could start to take advantage of cloud-based AI while managing the complexity of data privacy regulations and the natural sensitivity of healthcare data. Imagine assisting ophthalmologists by scanning retinal images for diabetic retinopathy, or pathologists by scanning lymph node biopsies for the spread of breast cancer. Breakthroughs in AI tasks like these will help transform healthcare but we need to be extremely careful with this sort of data. We need secure machine learning to process these tasks in a privacy-preserving manner.
- Secure computation can even lead to entirely new business models. Imagine multiple large banks pooling their data to train a fraud-detection model that is more accurate than what any one bank could develop on their own. The competitive and regulatory liabilities would be too great today. With secure machine learning, however, each bank could maintain control over their respective data. One party could choose to revoke their data at a later date and the others could continue without it. No data would be leaked. No control would be lost.
Consumers, enterprises, and organizations should have a choice over how their data is used. From apps to partnerships, users should maintain control over the data they bring to the table. Custodians of our data should be able to extend this control when dealing with third parties, like researchers and app developers, while maintaining full confidence they can revoke it in the event of misuse.
Secure AI Everywhere
In the early days of the web, HTTPS was only used for sensitive things like payments and banking. It enabled new use cases and expanded the scope of what we were comfortable doing online. Over the years, the technology matured through improved performance, reduced cost, and ease of implementation. Today, HTTPS is everywhere, from payments and banking to blogs. While secure machine learning may be limited to new use cases today, we believe in 10-15 years our apps and services will be privacy-preserving by default.
In the future, our most useful apps will rely on our most sensitive data, and we’ll be completely comfortable with it.
About Dropout Labs
We’re a team of machine learning engineers, software engineers, and cryptographers spread across the United States, France, and Canada. We’re working on secure computation to enable training, validation, and prediction over encrypted data. We see a near future where individuals and organizations will maintain control over their data, while still benefiting from cloud-based machine intelligence.
If you’re passionate about data privacy and AI, we’d love to hear from you.