Private GPT: Pioneering the Path to Confidential Generative AI

7 min readNov 23, 2023

Introduction

In the ever-evolving landscape of artificial intelligence, one of the key concerns that often arises is the trade-off between the power of models and the privacy of data.

Many industries, especially those dealing with sensitive information like healthcare and legal sectors, face a challenge in adopting generative AI due to these privacy concerns.

Introducing PrivateGPT, a groundbreaking project offering a production-ready solution for deploying Large Language Models (LLMs) in a fully private and offline environment, addressing privacy concerns head-on.

Let’s begin!

Genesis of PrivateGPT

The story of PrivateGPT begins with a clear motivation: to harness the game-changing potential of generative AI while ensuring data privacy. The first version, launched in May 2023, set out to redefine how LLMs could be utilized in a completely offline way. This was achieved by leveraging existing technologies from the vibrant Open Source AI community, including LangChain, LlamaIndex, GPT4All, LlamaCpp, Chroma, and SentenceTransformers.

The primordial version quickly gained traction, becoming a go-to solution for privacy-sensitive setups. It laid the foundation for thousands of local-focused generative AI projects, which serves as a simpler and educational implementation to comprehend the essential concepts required to build fully local and, therefore, private chatGPT-like tools.

Understanding the Architecture

PrivateGPT’s architecture is designed to be both powerful and adaptable. It consists of a High-level API and a Low-level API, providing users with a flexible set of tools to work with. Let’s delve into these components:

High-level API

Ingestion of Documents: Manages document parsing, splitting, metadata extraction, embedding generation, and storage internally.
Chat & Completions: Abstracts the complexities of a Retrieval Augmented Generation (RAG) pipeline implementation. Also, it handles context retrieval, prompt engineering, and response generation using information from ingested documents.

Low-level API

Embeddings Generation: Allows advanced users to generate embeddings based on a piece of text.
Contextual Chunks Retrieval: Given a query, returns the most relevant chunks of text from the ingested documents.

The project also provides a Gradio UI client for testing the API, along with a set of useful tools like a bulk model download script, ingestion script, documents folder watch, and more. You can visit the project on github here.

Behind the Scenes

Generative AI has the potential to revolutionize various industries, but its adoption is often hindered by concerns about data privacy. Companies, particularly in healthcare and legal domains, cannot afford to compromise the confidentiality of their data. The primary motivation behind PrivateGPT is to bridge this gap, which offers a solution that ensures data remains fully under the user’s control, even in scenarios without an internet connection.

Present and Future

PrivateGPT has come a long way since its primordial version. It is evolving into a gateway for generative AI models and primitives, encompassing completions, document ingestion, RAG pipelines, and other low-level building blocks. The aim is to empower developers to easily build AI applications and experiences while providing an extensive architecture for the community to contribute and refine. For knowing more about RAG visit here. If you are interested in knowing the code implementation of RAG, visit here.

To stay updated on the latest features and changes, users are encouraged to follow the project’s releases closely.

How to Dive In

For those eager to explore PrivateGPT, the documentation serves as a comprehensive guide. It covers installation, dependencies, configuration, running the server, deployment options, ingesting local documents, API details, and UI features. The documentation, available here, it is the go-to resource for a detailed understanding of the project.

Installation Steps

For quick installation steps, you can follow the step by step quick steps:

# Clone the repo
git clone https://github.com/imartinez/privateGPT
cd privateGPT

# Install Python 3.11
pyenv install 3.11
pyenv local 3.11

# Install dependencies
poetry install --with ui,local

# Download Embedding and LLM models
poetry run python scripts/setup

# (Optional) For Mac with Metal GPU, enable it. Check Installation and Settings section 
to know how to enable GPU on other platforms
CMAKE_ARGS="-DLLAMA_METAL=on" pip install --force-reinstall --no-cache-dir llama-cpp-python

# Run the local server  
PGPT_PROFILES=local make run

# Note: on Mac with Metal you should see a ggml_metal_add_buffer log, stating GPU is 
being used

# Navigate to the UI and try it out! 
http://localhost:8001/

Base requirements to run the PrivateGPT is to clone the repository and navigate into it.

git clone https://github.com/imartinez/privateGPT
cd privateGPT

Install the following before getting started:

Python =>3.11 (for earlier versions, this repository is not supported)
make (for scripts)
poetry (for dependencies)

Configuration

If you’re opting for a purely local setup, the default configurations of PrivateGPT are ready to use without any additional adjustments. Feel free to skip this section if you’re testing PrivateGPT locally, and return later to explore further configuration possibilities.

PrivateGPT’s configuration is managed through profiles, defined using yaml files, and selected via environment variables. Refer to settings.yaml for the comprehensive list of configurable properties.

Environment Variable: PGPT_SETTINGS_FOLDER
This variable designates the location of the settings folder, which defaults to the project’s root. The folder should include the default settings.yaml and any other settings-{profile}.yaml files.

Environment Variable: PGPT_PROFILES
By default, the profile definition in settings.yaml is loaded. With this environment variable, you can load additional profiles by providing a comma-separated list of profile names. This action merges settings-{profile}.yaml files on top of the base settings file.

For instance, setting PGPT_PROFILES=local,cuda will load settings-local.yaml and settings-cuda.yaml. Their contents will be merged, with properties from later profiles taking precedence over values defined in settings.yaml.

During testing, the test profile remains active alongside the default. Therefore, the settings-test.yaml file is essential for testing purposes.

Environment Variable Expansion

Configuration files in PrivateGPT are not static; they dynamically incorporate environment variables during runtime. This allows for flexible adjustments without the need for manual edits. The expansion pattern follows the format ${VARIABLE_NAME:default_value}.

For instance, consider the following configuration snippet:

server:
 port: ${PORT:8001}

In this example, the configuration uses the value of the PORT environment variable, defaulting to 8001 if the variable is not set.

Local LLM Requirements

PrivateGPT provides an option for local execution, but it comes with specific dependencies. To install these dependencies, run the following command:

poetry install - with local

For optimal performance, GPU acceleration is recommended. PrivateGPT supports local execution for models compatible with llama.cpp. Two known models that work well are provided for seamless setup:

1. TheBloke/Llama-2–7B-chat-GGUF
2. TheBloke/Mistral-7B-Instruct-v0.1-GGUF

To simplify installation, use the provided setup script:

poetry run python scripts/setup

Customizing Low-Level Parameters

If you need to customize parameters like the number of layers loaded into the GPU, modifications can be made in the llm_component.py file. Adjustments should be approached carefully, especially when dealing with potential out-of-memory errors.

Platform-Specific GPU Support

OSX GPU Support: For GPU support on macOS, llama.cpp needs to be built with metal support. Execute the following command:

CMAKE_ARGS="-DLLAMA_METAL=on" pip install - force-reinstall - no-cache-dir llama-cpp-python

Windows NVIDIA GPU Support: Windows GPU support is achieved through CUDA. Follow the instructions on the llama.cpp repo to install the required dependencies. A few tips for NVIDIA cards and CUDA on Windows are provided for a smooth setup.
Linux NVIDIA GPU Support and Windows-WSL: Linux GPU support also relies on CUDA. Ensure an up-to-date C++ compiler and follow the instructions for CUDA toolkit installation. Verify your installation with the provided commands.

Vectorstores

PrivateGPT supports Chroma and Qdrant as vectorstore providers, with Chroma being the default. To enable Qdrant, set the vectorstore.database property in the settings.yaml file and install the Qdrant extra:

poetry install - extras qdrant

Qdrant settings can be configured in the settings.yaml file, offering flexibility in customization.

Environment variables serve as the linchpin in configuring PrivateGPT, enabling a seamless and dynamic adjustment of settings to meet specific requirements. Whether tweaking low-level parameters or ensuring GPU support, understanding the role of environment variables is essential for a smooth PrivateGPT experience.

Conclusion

PrivateGPT is not just a project, it’s a transformative approach to AI that prioritizes privacy without compromising on the power of generative models.

As it continues to evolve, PrivateGPT holds the promise of unlocking the full potential of generative AI in diverse industries, enabling developers to build applications with confidence in the security and privacy of their data.

Enter the realm of PrivateGPT, where innovation meets privacy in the world of generative AI.