Building a private GPT with Haystack, part 1: why and how
This article outlines how you can build a private GPT with Haystack. A private GPT allows you to apply Large Language Models, like GPT4, to your own documents in a secure, on-premise environment. We show some of the powerful capabilities that this unlocks for enterprises. Read part 2 for more technical detail.
What is a private GPT and why should I want one?
With the explosion of Large Language Models (LLMs) in the last month, there has been incredible interest in so-called private GPTs. In a nutshell, a private GPT allows you to use Large Language Models on your own data/documents in a secure environment. These are distinct from your typical GPT chat interfaces in two key ways:
- You can run them on your own data/documents (although, for example, Anthropic’s Claude allows you to upload documents as well), or connect them to a knowledge base
- They are hosted locally, meaning that your sensitive data does not leave your servers (e.g. when you do an API call to OpenAI)
This set-up comes with both opportunities and drawbacks. One drawback is that you are limited to Open Source LLMs. Using an API call to OpenAI, Anthropic or Cohere means that data will leave your secure environment. This would rule out these providers and their state-of-the-art language models.
Moreover, you are limited by your infrastructure: LLMs are expensive to run, requiring a lot of (GPU) processing power and memory. Your infrastructure may simply not be set up to deal with e.g. the 40-billion parameter Falcon model, and you may want to use the less powerful 7-billion parameter model instead.
Conversely, investing in a local LLM-powered infrastructure opens up new ways to handle text data. For example, one client wants to compare their internal ESG requirements to companies’ ESG statements. Without LLM capabilities, this would require several weeks’ worth of painstaking analyst work to complete. With LLMs, this can be set up as an auditable, automated process that can also be productized for other use cases. For example, the same company could use this to audit internal reporting, by checking if internal reports comply with the standards outlined in one document.
How do I get a private GPT?
Now that I have convinced you of the need for a private GPT, you want to know: how do I get one? Let’s dive right into it.
A private GPT will have a number of components from the below diagram courtesy of Andreessen Horowitz. If you’re interested in LLMs more generally, the article’s well worth a read.
To build our own, locally-hosted private GPT, we will only require a few components for a bare-bones solution:
- A Large Language Model, such as falcon-7b, fastchat, or Llama 2. For local deployment, with data staying within your own network, I believe downloading a model from HuggingFace is the only option at the moment.
- Some form of App Hosting or a front-end. In my own case, I have used Streamlit to deploy locally.
- A vector or document database to store your documents in. The a16z diagram lists Pinecone, Weaviate, Chroma, and pgvector as options. These are all good options, as my article on Weaviate shows. However, a vector database is not required for a privateGPT. You could run it on a postgresql database, or Elastic Search, as long as you find an effective way to serve the LLM with relevant text data. In my case, I used ElasticSearch as it came out of the box quite easily with Haystack.
- Some type of orchestration component. Here, the options listed are Python/DIY, Langchain, LlamaIndex, and ChatGPT. Another option would be Haystack. I also know a developer who prefers to go to the source and write everything in Transformers. Here, I will use Haystack. There’s two reasons for that. Firstly, I’m familiar with the package. Secondly, Haystack provides quite an easy API to upload .txt and .pdf documents, saving me the time to set up a workflow to ingest documents.
This leads to an architecture that looks like this. I will call this setup Promptbox for now.
Architecture for a private GPT with Haystack
Let’s go through this setup one by one. I have added some numbers to different components of the setup, to guide you through it.
- Document upload and preprocessing
The first is document upload and preprocessing. Document upload is straightforward: just browse through your folders, and select the documents you want to work on.
Preprocessing is an undervalued step in LLM app building, because Large Language Models still face many limitations, in particular when it comes to context windows. In theory, it would be nice if we could upload, say, the entirety of the Lord of the Rings and direct the LLM to convert it into a film script. In reality, the LLM cannot ingest such a huge pile of information, but only a few hundred words. We therefore need to select small chunks for it to work on. This means that our documents must be broken up.
Here is where using Haystack is quite convenient, as it comes with several tools to preprocess documents already. The preprocessed documents are then stored in ElasticSearch.
2. User requests and 5. output
The second component is handling user requests. Because users rely on a predictable output, this will also cover the output users can expect. The way I have set up Promptbox, different types of user requests are possible. One allows users to ask multiple questions about one document using different models.
Here, we have selected the energy policy published by global bank HSBC as our document to analyze. We run multiple queries on the document, which are served in a table:
Alternatively, the user can ask a chain of questions about all available documents. Here, we’ve lifted some documents from the UK’s Financial Services and Markets Tribunal, which investigates if providers of financial services have been guilty of misconduct. One key question is whether a defendant was deemed to be of ‘fit and proper’ character.
Here, Promptbox chains together a number of questions and their answers. This results in a table-like structure, where the third question only gets asked of documents that result in the output ‘no’ on the first question:
3. Text retrieval
User requests, of course, need the document source material to work with. Because, as explained above, language models have limited context windows, this means we need to retrieve snippets of text to give to the language model. For this, we need a retriever that finds the most relevant passages for us.
For a successful LLM architecture, this step is arguably more important than prompt engineering or how you phrase the question. The LLM is only as good as the information it has been fed.
In the above example using the UK’s Financial Services and Markets Tribunal, we ask if an applicant was deemed ‘fit and proper’ or not. Some of the cases concern appeals: an applicant was initially not deemed fit and proper, but on appeal this was overturned. Within the text of the document, this means there will be many paragraphs explaining why in first instance, the applicant was deemed not fit and proper. If the LLM only gets fed the background of the case, it might on the given information return the output that the candidate was not deemed fit and proper — when in reality this judgement was rescinded.
Because language models have limited context windows, it is not possible to feed them full text to base their answer on. This means that proper retrieval becomes an incredibly important component, striking a balance between too much and too little information.
4. Running it through the LLM
Alright, so now we’ve received snippets of text that we feed to the LLM. Then what?
Haystack works with a component called PromptNodes. The component has two essential components:
- An LLM that it uses, for example falcon-7b
- A prompt template that specifies what it should do with the incoming query (user request) and text snippets. In Promptbox, we use the following standard Haystack template (which, by the way, you as a user can edit!):
Synthesize a comprehensive answer from the following given question and
relevant paragraphs.
Provide a clear and concise response that summarizes the key points and
information presented in the paragraphs.
Your answer should be in your own words and be no longer than necessary.
Question: {query}
Paragraphs: {join(documents)}
Answer:
This means that if we ask a model “Was the applicant deemed ‘fit and proper’ or not?” and provide relevant snippets, the input the model receives is:
Synthesize a comprehensive answer from the following given question and
relevant paragraphs.
Provide a clear and concise response that summarizes the key points and
information presented in the paragraphs.
Your answer should be in your own words and be no longer than necessary.
Question: Was the applicant deemed 'fit and proper' or not?
Paragraphs: 6. Section 56 of the 2000 Act provides that the Authority may
make a prohibition order prohibiting an individual from performing specified
functions, or any function, if it is satisfied that the individual is not a fit
and proper person. The issue 7. The Decision Notice stated that the Authority
had decided to take the following action: (1) to impose a penalty on Mr
(...)
interviews and if he had done so he would not have made his statement in
the way he had.
Answer:
In order to give the user full freedom and audit possibilities, we actually make it possible in Promptbox to see the input the model receives in the ‘Detailed output’ tab:
With this information, the LLM performs its stochastic magic and we receive the input, as seen above.
Conclusions and next steps
In this guide we went through how to set up a private GPT and showcased the opportunities for working with your documents. The field is still young, however! There are so many opportunities to apply LLMs to your documents, which makes me excited about what the future will bring.
The next installment will be far more technical, as we’ll dive into the code that’s used to run this little tool, and run you through how to set it up on your machine!