Privacy Preserving GenAI at Scale: Anonymize Your Text without GPUs for a hundredth the cost.

Published in

ThirdAI Blog

3 min readMay 14, 2024

Enterprise engineering teams are significantly enhancing productivity by leveraging the power of OpenAI and other generative AI services to process large volumes of unstructured text. They are creating and evaluating productivity tools for their customers, employees, and downstream businesses. Production-grade tools that utilize business-critical text require strict data compliance guardrails. Most enterprises have compliance policies regarding the type of text that can be shown to customers and what can be sent to third-party AI API services like OpenAI GPT-4.

Table 1: Cost Analysis of Building an NER (Named Entity Recognition) Pipeline for identifying sensitive information. Popular Open Source models like BERT are 100x costlier irrespective of the instance type. Cheaper options like spaCy performs poorly, only reaching ~65% accuracy.

Figure 1. Inference Latency or response time comparisons of open source BERT and ThirdAI

The Growing Need for Privatizing Raw Unstructured Text Chunks

Whether building customized question-answering tools or customer-facing chatbots with RAG (Retrieval Augmented Generation) systems, we must ensure that any text leaving the secure environment or being shown to customers does not contain PII (Personally Identifiable Information) such as SSNs, credit card numbers, or emails. Before using any text chunk with an external API, for example an embedding service or prompting GPT-4, we need to redact any PII from the text chunks that leave the secure environment to ensure compliance

Existing Solutions: High GPU Use and Increased Latency

Named Entity Recognition (NER) is a well-known problem in text analysis, and there are several open-source models available for tackling it, such as those offered by Hugging Face. However, scaling up these models requires substantial GPU resources. For example, foundational models like BERT are not only resource-intensive — needing several hours on 4 V100 GPUs for fine-tuning (See Table 1) — but also operate slower during deployment, potentially slowing down application response times by about 50 times or more (See Figure 1). In contrast, cheaper alternatives such as spaCy, while less resource-demanding, offer significantly lower accuracy rates (65% versus the 93% achieved by more sophisticated models). This disparity in performance highlights the trade-offs between computational demands and accuracy in deploying NER solutions.

ThirdAI’s CPU-Only NER Models: Quick Fine-Tuning, Ultra-Fast Latency, SOTA Accuracy. Free your GPU cycles.

We’re excited to showcase our purpose-built pre-trained foundational models for NER, designed to deliver state-of-the-art (SOTA) accuracy with ultra-fast latency, 30–50x better than BERT or DistillBERT. Remarkably, even with up to 1000 tokens, our models maintain about 50ms latency for all the 1000 label predictions on a single CPU core. (See Figure 1 and Table 1 for comparisons)

Fine-tunability on Cheap CPU-only Instances: Our models can be fine-tuned on hundreds of thousands of labeled samples within minutes on affordable, readily available CPU-only instances, maintaining consistent performance during inference and fine-tuning across different processors.

For developers interested in leveraging our technology, we provide a simple script for deploying our pre-trained NER model, capable of identifying standard NER categories across languages. More details on the languages and categories supported can be found in the instructions. For fine-tuning on specialized multi-lingual datasets, use this alternative notebook.

Order of Magnitude Better TCO (Total Cost of Ownership)

Table 1 summarizes the cost of fine-tuning and the difference in Figure 1 directly translates into the cost of deployment. We can clearly see that with ThirdAI’s technological difference, we can bring down the cost by 100x while removing the constraints of having dedicated GPUs resources. Privatize your data without a sweat anytime, anywhere, using ThirdAI.

Important Links

Here is the link to all python notebooks and instructions.