How Zero Knowledge Proofs will enable the Next Wave of AI Applications
Privacy concerns have long been a barrier to fully unlocking AI’s potential in sensitive industries.
The addition of privacy-preserving technologies to AI systems will unlock generative algorithm’s greatest potential to humankind.
Introduction
In this blogpost we highlight AI in healthcare as a rolling example to sketch future synergy between AI and Zero Knowledge Proofs (ZKPs).
In particular, we conjecture that security of AI will become a new frontier— both for protecting against attackers with something to gain from targeting mission-critical and sensitive AI systems, and for regulated users and custodians of private data.
For both, ZK is going to be key for progress. In this piece we put forth two areas of AI/ML in which the incorporation of ZKP seems inevitable:
- Federated Learning
- Machine Learning as a Service (MLaaS)
In each case we argue that ZK can replace trust assumptions with verifiable math, contributing to further breakthroughs in AI tech.
Today’s Baseline
A reality of AI systems is that they are only as good as the datasets they’re trained on, and the algorithms used to process that data.
This is unconcerning when requesting ChatGPT to write a song about inflation in the style of Snoop Dogg, or when creating an image of a humanoid robot with Midjourney, but it is a concern when AIs require private or confidential data in order to provide personalized and critical services.
An industry that would benefit greatly from privacy-preserving AI is medicine and healthcare.
Today, medical records are held across siloed healthcare organizations, from hospitals to insurance agencies, each with their own set of security protocols.
This causes problems for patients who receive care from multiple health providers and frustration for medical researchers who cannot access scattered data.
It also prohibits the use of this valuable real world input for Machine Learning datasets.
Enter Privacy Preserving Technologies
Zero Knowledge Proofs (ZKPs) are protocols for proving the veracity of information, without revealing anything about that information.
Fully Homomorphic Encryption (FHE) is a type of encryption that allows data to be processed while remaining encrypted.
In the context of healthcare, ZKPs and FHE can be used to liberate medical data while preserving patient privacy.
Using FHE, encrypted medical data hosted on secure servers could be used for developing machine learning models, as well as inferences from the models by interested parties.
Zero Knowledge Proofs could be used to authenticate that a claimed model or dataset was indeed used for training or predictions of the model.
Interested parties could submit queries to the data without actually seeing the underlying information. The server would only return the results of the query, rather than the underlying data itself.
A ZKP could also be used to prove the queries were used to deliver data from a specific dataset with public commitment.
These are examples of Privacy Preserving Machine Learning (PPML), a subset of Machine Learning with an emphasis on preservation of privacy and confidentiality of data.
PPML would alleviate privacy concerns, but what about the issue of data being scattered across various protected repositories?
Federated Learning with ZKP
Federated learning is a Machine Learning methodology for training an algorithm on decentralized information. It enables multiple parties to build a model without sharing data.
Today, Federated Learning comes with concerns about trustworthiness of data provided by each contributing member. How can distributed teams be 100% certain that accurate data is being provided by the other parties?
Using a ZKP, each team could contribute their data to the global model, along with a proof that the information provided was collected and processed using the agreed-upon parameters, without revealing anything about the data itself.
Furthermore, a ZKP could prove that each team’s model has not been tampered with by third parties.
By taking advantage of ZK Federated Learning, ML engineers could train a “Doctor AI” model using patient records from across the medical data landscape, secure in the knowledge that each team’s training data was provided according to the agreed-upon framework.
Machine Learning as a Service (MLaaS)
Today, Machine Learning as a Service providers include OpenAI, Microsoft, Google, AWS, and others. This opens the door for questions about consumer protections. How can customers verify that their inputs are truly processed by the MLaaS models, and not routed through third parties or compromised systems? Privacy-sensitive customers may also want a guarantee that their inputs and outputs are not leaked to outside parties.
Using Zero-Knowledge Proofs, it is possible to trustlessly verify model predictions, while ensuring queries have gone through the advertised models. This also protects the MLaaS providers from revealing the weights of their models.
A ZKP could verify that inputs and outputs arrived only to their intended destinations, and that the model is tamper-free.
The Emergence of Confidential and Verifiable Generative AI
The combination of FHE and ZKP will be a game-changer for Privacy Preserving Machine Learning (PPML).
FHE means that datasets can be analyzed without decryption. It guarantees privacy of input to AI and ML models because the proffered data is encrypted at the source.
By combining FHE and ZKP, users are able to trust that their prompts remain private and secure throughout the AI interaction.
A medical MLaaS provider could use ZK to ensure veracity of models along with FHE to protect private prompts, and that queries remain encrypted even while processed by the server. The same ZK could prove that the encrypted data was run homomorphically on the committed model.
ZK Proofs can also validate that the origin of data is from AI — ensuring this information is taken into account and that results may not be fully accurate.
A Future with Privacy Preserving AI
Imagine a doctor’s visit in the near future. Upon arrival, the doctor accesses your complete medical history, taken from a universal record base made possible by ZK Federated Learning. Every documented ache and pain from the time of birth until this moment is retrievable.
Using an FHE enabled private prompt, an AI trained on the records of millions of people with similar profiles, medical histories, and treatment results processes the input. Utilizing ZK ML, it then runs predictive analytics and personalizes a treatment plan without ever accessing private data.
The returned information along with the prompt remains encrypted every step of the way, decrypted only upon its return to the point of origination. ZK proofs guarantee that each query was processed and run by the intended machine learning algorithm.
As AI continues to evolve, incorporating privacy-preserving technologies will be crucial for maximizing its benefits across medicine and healthcare, along with a spectrum of other sensitive use cases.
Follow Ingonyama
Twitter: https://twitter.com/Ingo_zk
Github: https://github.com/ingonyama-zk
YouTube: https://www.youtube.com/@ingo_zk
LinkedIn: https://www.linkedin.com/company/ingonyama
Join us: https://www.ingonyama.com/careers