Generative AI from a Privacy Lens

Sudip Kar
BerkeleyISchool
Published in
6 min readJan 25, 2024
Image credit: iQoncept — Adobe Stock

Maybe it was just a coincidence that the National Institute of Standards and Technology (NIST) published an article about evaluating privacy protection techniques for the AI era on December 11, 2023, a day before two other MICS (Masters in Information and Cybersecurity) students and I presented our project for the Privacy Engineering class for the Fall 2023 term. The topic of our project was AI Privacy Reinforcement and Optimization (A.I.P.R.O) where we tried to implement differential privacy to Gen AI.

Privacy concerns with AI are not an alien topic for folks in cybersecurity in recent times, especially for those associated with privacy engineering and similar fields. Throughout our privacy engineering class at UC Berkeley, we learned techniques on how to prevent identity, attribute, and membership disclosures on interactive and non-interactive databases. As a result of our learnings in the course, I was interested in studying ways to protect privacy in the case of Gen AI such as OpenAI’s ChatGPT.

Our idea stemmed from a very intriguing article by Richard Carufel where he talks about how most companies (93%) are aware of the risks associated with Gen AI but only a few (9%) are prepared to manage those risks. He also mentions some very important numerical data behind the different risks associated with Gen AI use — the most relatable being that data privacy and cyber issues account for 65% of top ChatGPT and Gen AI risks.

Figure 1: Different risk vectors posed by ChatGPT and similar GenAI. From an article by Richard Carufel

Another influential paper named “From ChatGPT to ThreatGPT: Impact of Generative AI in Cybersecurity and Privacy” written by Gupta et al. provides details on different methods such as jailbreaking (the act of using specific prompts to trick the AI model into producing responses that it would avoid), reverse psychology (phrasing questions or statements in a way that indirectly prompts the AI to generate the desired response), prompt injection (providing a malicious prompt to the LLM to gain sensitive information), and etcetera — highlighted in the red box at the left in below diagram — that can easily trick GenAI into leaking sensitive information.

Figure 2: Highlighted various types of attacks performed on GenAI. From IEEE paper by Gupta, Akiri, Aryal, Parker, Praharaj

Multiple research studies and our knowledge about privacy engineering from our MICS class led us to our project — A.I.P.R.O.

So, what are we doing?

The approach in our model is motivated by the goal of preventing unwanted access to confidential information — in the case of our project: medical records via the OpenAI API. Below is a sample Python code that calls the OpenAI API using the PineCone API to get the medical records of 10 people:

Figure 3: Sample code to get medical records using OpenAI API. From Privacy Engineering project A.I.P.R.O by Sudip Kar, Iliya Tynan, Mario Borroto, UC Berkeley Fall 2023.

This is a sample demonstration of how easily personal information can be compromised by imitating an adversarial request as well as how to successfully retrieve patient data using OpenAI API. This is a huge privacy risk as the information can be used to identify a person with a particular health condition with 100% probability.

Although this data returned by OpenAI API is synthetic, methods mentioned previously such as jailbreaking and reverse psychology can be used to gather real data from GenAI APIs using sophisticated attacks. Another attack method we came across that has proved effective is divergence attacks (asking Gen AI to repeat certain words or phrases endlessly). This would cause the AI to stray into other material that might include sensitive information, which as shown in an article by Scott Ikeda, can be used by attackers.

Ok… well… How are we doing it?

We pondered over various methods to protect against such information disclosure and decided the exponential mechanism of differential privacy could mitigate this risk. This approach was carefully selected due to its suitability for interactive databases (such as the vector databases in GenAI) and also because its implementation is not limited to just numeric data. Exponential mechanism also caters to non-numeric results such as string. Mathematically, the mechanism is defined by the expression below:

Figure 4: From Professor Daniel Aranki’s Privacy Engineering Course (UC Berkeley)

In the above expression:

  1. f is a query such as “Give me a dataset of medical diagnosis of 10 people with Name, Age, Sex, and Diagnosis”.
  2. Epsilon is the privacy factor.
  3. u(D,r) is a notion of utility score that scores each possible outcome r of the query on D.
  4. Delta(u) is defined as the below expression:
Figure 5: From Professor Daniel Aranki’s Privacy Engineering Course (UC Berkeley)

In our project, we implemented exponential mechanism of differential privacy with the following assumptions:

  1. The value of epsilon was chosen to be 1. For practical use, the value of epsilon should be chosen very carefully. The lower the epsilon, the better the privacy but that takes a hit on utility.
  2. We are selecting the sensitivity (Δu) = 1. This should be calculated for neighboring databases using the formula shown in Figure 5.
  3. Implement on exported data.

All these assumptions are relative to just our project that dealt with medical records, therefore these will change depending on privacy needs.

The diagram below depicts a basic architectural design of A.I.P.R.O:

Figure 6: Design of A.I.P.R.O. From Privacy Engineering project A.I.P.R.O by Sudip Kar, Iliya Tynan, Mario Borroto, UC Berkeley Fall 2023.

In this design, the user enters a search into OpenAI. Our solution then takes the raw output of the search, implements differential privacy to the output and provides that to the user.

The implementation for the mechanism in Python is as follows:

Figure 7: Implementation of differential privacy to GenAI. From Privacy Engineering project A.I.P.R.O by Sudip Kar, Iliya Tynan, Mario Borroto, UC Berkeley Fall 2023.

Point to note: Sometimes disclosing the likelihood distribution might leak some information about which output was most likely before differential privacy was applied. As such, the implementation should cover those scenarios. This requires exhaustive research on a vast array of datasets.

Final Thoughts

With the above results, an adversary cannot identify a person as having a certain health condition with 100% accuracy because our model only provides a likelihood of population after implementing differential privacy. Even knowing the population count will not reveal highly accurate information about the number of people with a particular health diagnosis.

This project is at a proof of concept/prototype stage with a lot of work remaining on implementing effective differential privacy mechanism on GenAI while still maintaining utility. The ultimate goal is to make this model more robust per NIST guidelines so that it can be applied more generally to GenAI and not just one particular use case such as medical records.

Lastly, a big THANK YOU to Prof. Daniel Aranki and the School of Information for providing us with the opportunity to work on this project as part of our privacy engineering class.

Thank you for reading this post. If you like this article, please share it with your friends and colleagues. Also, if you have any questions or feedback, feel free to leave a comment below!

Sudip Kar is pursuing a Master of Information and Cybersecurity at UC Berkeley’s School of Information. This article was inspired by his project for the Privacy Engineering course.

--

--

Sudip Kar
BerkeleyISchool

Software Engineer by profession. Cyber enthusiast/reader/researcher by hobby.