How to Assess Data Privacy in GenAI Products

Docq.AI
7 min readFeb 22, 2024

--

You need to know this about GenAI SaaS applications

There’s heightened concern about data security and privacy when it comes to generative AI software. We covered this a little in our post “Using Generative AI with your private business data”. Like with all things new, there’s an amount of fear of the unknown. Beyond that, is there something to be concerned about? In short, yes. But why?

The Large Language Model (LLM) at the heart of Generative AI applications is the component we need to pay attention to. What it does is extremely impressive. But at a high level how it runs is nothing special. Many companies offer LLMs-as-a-Service and products built on top of LLMs. The design and the way they run aren’t equal.

This post will equip you to understand the differences when evaluating AI solutions. We’ll explain some foundational concepts, use those to explain why we need to pay attention to how the LLM is hosted, and lay out the different hosting options to look out for. By the end, you should be able to decide which hosting option is best for your use case. Then which questions to ask to to evaluate if a solution fits your needs.

GenAI Application Architecture

The simplest high-level GenAI software design has the following two components and runs as a single application (like Solitaire on your PC).

App = User Interface(UI) + LLM

An application here could be a desktop app, a web app, or a command line app. The code used to build the user interface for any of these types of apps will have the same logic. Regardless of the technology used to build it.

Let’s unpack this a little into how you interact with an app.

(user input) → user interface → LLM → user interface → (response to user)

There are two designs of the above when running:

  • Single-running program - the LLM is embedded inside the app UI code.
  • Two running programs - the LLM and UI code run as separate programs and talk to each other over the network via an API (Application Programming Interface). In this setup, the LLM could be running on the same computer as the UI code or on a different computer.

LLMs

There’s plenty of writing that explains the novel neural network technology behind LLMs. From a runtime perspective, we can simplify and ignore this. Simply, it’s a piece of software that takes a word as input and predicts the word that is most likely to follow. This process runs in a loop.

“dog” (input) → LLM (predict) → “barks” (output)

“dog barks” (input) → LLM (predict) → “loud” (output)

All useful software takes input, does something, and gives output. That means we can even ignore the special sauce inside the LLM. Which boils it down to being a piece of software that runs on a computer.

But, many of the latest and most capable LLMs are challenging to run because they are too big to fit on a single computer. Massive is measured in terms of computer processing power and memory requirements. LLMs like Falcon, Llama2, Mistral, Claude, GPT4, and Gemini don’t fit on a single instance of the largest computers currently available. They need several of these computers connected in a special way called a cluster. The clustering technology required to run an LLM across several computers is complicated.

They typically run on Graphical Processing Units (GPU) for optimal performance. Nvidia is currently the leading GPU manufacturer and struggling to produce chips fast enough to meet demand. Even companies that have the capital can’t get hold of chips easily. Specialist AI chips from companies like AWS, AMD, Arm, Google, Meta, and Groq are a rising category. The latest and most capable LLMs require scarce and expensive hardware.

What do we do when a resource is expensive and hard to acquire? Share it, as much as possible.

SaaS is Shared Software

SaaS is renting software.

Typically a provider would run a single copy of the software in a data centre or cloud provider account owned by them. Each user gets a unique login account. User data is separated from each other using the login account. This is called logical partitioning. For enterprise software, users and data will also be organised by a unique organisation account. This flavour of SaaS is called multi-tenant. It’s the most common model.

The idea is to share as many of the resources across as many customers as possible to maximise efficiency and utilisation. Reduced costs and increased profits are great for business. The extreme of this is when everything is shared, like a website, but rarely possible with business software. Partitioning data by users and organisation accounts is considered “soft”. It’s relatively weak in inherent security guarantees. Let’s look at the opposite extreme to understand this better.

The opposite extreme, where nothing is shared, is where every user has a dedicated copy of the software running on a dedicated computer that’s not connected to a network. Data is only saved locally. Nothing is shared. You can see how physical separation and isolation in this way gives security guarantees. There is no way your data could leak accidentally to another user if it’s on a physically separate computer.

Running advanced software in this highly isolated fashion is cost-prohibitive. Isolation also limits utility. So it’s a trade-off. The right answer depends on the situation and often lies somewhere between these extremes.

Data Isolation in LLMs

We established that:

  • Expensive things tend to be shared to share the cost.
  • SaaS is a resource-sharing/renting model for software.
  • User and organisation data separation is “soft” in SaaS.
  • LLMs are very expensive and complicated to host*.

Therefore it’s logical that many of the latest LLMs are provided as SaaS. It’s pretty normal for businesses to use SaaS even with sensitive data. So what’s the difference we’ve been implying?

Two things:

  • There’s no technique yet to logically or virtually partition inside an LLM by user, organisation, encryption etc.
  • LLMs are highly complex or even chaotic systems (as these terms are defined in the Cynfin framework). It’s called non-deterministic and has emergent behaviour. You can’t predict what it’s going to do with certainty.

The combination of these two is what elevates the risk of a shared LLM when used with confidential and sensitive data. So “Is the LLM shared or dedicated?” is an important question to consider and ask when evaluating solutions.

Four Flavours of LLM Hosting

  • Local*- embedded in the application and running in the user’s device e.g. desktop native app or mobile app.
  • Self-hosting - hosting in your own data centre or public cloud account. Inherently everything in the stack is dedicated. Example: Azure ML Online endpoints or AWS SageMaker endpoints.
  • Provider-dedicated hosting - an LLM instance run just for you. Systems like the web API layer can be shared. For example: Azure OpenAI.
  • Shared hosting - Multiple organisations share a single running instance of the LLM and other layers in the stack. Example: OpenAI

*More on LLM Size

Earlier we focused on how large the latest LLMs were hence difficult to run. To give a complete picture, many relatively smaller and capable LLMs exist. They can run on a single computer, even a laptop. Even the latest large models have compressed versions. However, these are less capable. Often they become narrow in the use cases they are good at. For example where a large model will be highly capable of content generation, chat, code generation, and multi-lingual use cases. A small version might only be good at chatting in English. So smaller models can be good at specialised tasks.

Range of experience is another way to think about the differences between larger and smaller LLMs. The larger model has a greater amount and diversity of experience. Much like with humans, this results in a greater ability to adapt and perform well at a wider range of tasks.

In any case, if the model is embedded in an app on your personal computer, it probably needs to be a fairly high-spec one with a GPU. No doubt, in time this will change. Models will become more efficient. Processor chip prices will continue to get cheaper, as has been the story for decades.

Wrapping up

When evaluating a software solution that involves AI technology, first identify the data it will process and the sensitivity of that data. If the data is sensitive proceed to understand the architecture of the solution and therefore the level of isolation. Pay special attention to the LLM (or other machine learning model) component. The more isolated the hosting setup is, the lower the inherent risk.

--

--

Docq.AI

Private & Secure ChatGPT alternative that unlocks knowledge in your business docs. Posts by Janaka Abeywardhana, CoFounder.