Easy PII replacement for OpenAI, Mistral and Anthropic APIs

with Sarus Arena open-source application

Nicolas Grislain
Sarus Blog

--

Working with commercial LLM APIs such as OpenAI, Mistral or Anthropic is an easy and powerful way to add AI features to your products, but it may pose challenges when your application is dealing with confidential data such as Personal Identifiable Information (PII).

In this post we will see how Sarus Arena gives a simple no-code solution to some of the problems posed by personal data.

The problem

You just built a nice AI-powered product, but as you consider pushing it to production you realize you do not have the authorization to send customer information to OpenAI, Mistral or Anthropic. Even if you have the right to do so, you may not be willing to take the risk to have some confidential information from your prompt leak in some LLM responses.

One of the ways you can mitigate the problems posed by the presence of PII in your prompt is to filter out PII. You can use solutions like Langchain and Microsoft Presidio to do so, but they require you to code it yourself, put it in production, maintain it and scale it at some point. Also, you may want to track PII removal quality and improve it. Here is how Sarus Arena open-source framework can help you.

Introducing Sarus Arena

Sarus Arena is an open source application. Arena does:

  • LLM evaluation: AB-testing, user-feedback evaluation, formula-based evaluation and LLM as a Judge
  • LLM policing: Request and response filtering and redacting, evaluation-based routing (pre-alpha)
  • LLM distillation: Train your own model in one click based on your history of request-response-evaluation-triples (pre-alpha)

You can easily deploy Arena on any kubernetes cluster or try as a SaaS product at: https://arena.sarus.app/.

Using Arena for PII removal

Sarus Arena can be used to remove PII from your interactions with OpenAI, Mistral or Anthropic.

Take your code:

import os
from openai import OpenAI

client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

resp = client.chat.completions.create(model="gpt-3.5-turbo", messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Alice invited Bob for lunch. Bob will sit next to Alice. How many people are having lunch with Alice?"},
])

print(f"resp = {resp.choices[0].message.content}")

Add 3 lines:

import os
from openai import OpenAI

# import arena client (install it with `pip install arena-client`)
from arena import Client, LMConfig
# Alter OpenAI behavior
arena.decorate(OpenAI, mode='proxy')
# Setup arena
arena.lm_config(lm_config=LMConfig(pii_removal="replace", judge_evaluation=True))

client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

resp = client.chat.completions.create(model="gpt-3.5-turbo", messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Alice invited Bob for lunch. Bob will sit next to Alice. How many people are having lunch with Alice?"},
])

print(f"resp = {resp.choices[0].message.content}")

Then you can monitor the way PII are removed in the Arena web interface:

You can try Arena on Sarus public test instance: https://arena.sarus.app/

You can create a test user at: https://arena.sarus.app/api/v1/users/open?email=test@gmail.com&password=test&full_name=Test (make sure you use your own email address and password).

⚠️ Be careful, using arena public test instance will send your API token to the instance, make sure it is short lived or to revoke it just after you tried Arena.️️ For production/critical applications you would install your own instance.

If you liked this post, feel free to add stars to its github repo and make sure you read the next ones by following us.

--

--