Building a LLM with up-to-date GCP Documentation knowledge

Sam Gallagher
4 min readSep 1, 2023

--

What if…

What if a LLM had always updating knowledge of the Google Cloud Platform documentation?

What if this LLM cited the sources of documentation it used to generate a response?

What if users had the ability to pick different selections of documentation, number of documents to load as context, and the underlying LLM model to use to generate responses?

What if all of this happened “under-the-hood” for end users and they just used a UI to ask these questions?

These were the questions I chased a couple weeks back when I struggled to get relevant information about Google Cloud documentation from tools like OpenAI ChatGPT. Instead of solid answers, I got responses like:

Difference in responses. Vanilla ChatGPT on the left, my custom LLM system on the right.

Me: Is Gateway API for GKE in beta?

ChatGPT: I apologize, but I don’t have real-time information about the current status of specific features or products beyond my last update in September 2021.

Note: Gateway API hit Generally Available on November 10th, 2022.

So I built a system to answer those “what ifs”, and now get answers like:

Me: Is Gateway API for GKE in beta?

Chat Bot: No, the Gateway API for GKE is not in beta. It is generally available (GA).

Sources:

https://cloud.google.com/kubernetes-engine/docs/how-to/deploying-gateways

https://cloud.google.com/kubernetes-engine/docs/how-to/enabling-multi-cluster-gateways

https://cloud.google.com/release-notes

Let’s dig into how this worked.

Design

I don’t know much about how ML/AI works (besides some basic math), but I do understand software, systems, and cloud architecture and design. With these concepts in mind I focused on building the following three components.

  1. Data pipelines that can fetch the latest version of the GCP documentation, detect changes, and push the latest version to some database.
  2. A database of the latest version of the GCP documentation other systems can natively read from.
  3. An API layer that can take a request from a user, find contextually useful documentation from the above database, and feed that to the LLM as part of the prompt.

Let’s start with a view of the forest.

Conceptual Architecture

High level conceptual architecture

The above image describes the conceptual architecture for the system, but I want to point a few things out.

  • The data pipelines need to run on a recurring schedule to be useful, to catch the latest update.
  • I added a caching layer to the data pipelines to save on cost. I am using the OpenAI Embedding model to create the embeddings for the database and only want to push changed, or new, content.
  • The UI / Proxy API Layer denotes any sort of service that is able to interact with the Query API. In my case this was a custom web app, but could just as easily be an API for Slack, Teams, text, etc.

The tools for each layer is also defined in the above architecture, but let me call them out explicitly as well:

  • Python is used for the data pipeline. This is a mix of BeautifulSoup, Reqeusts, and Argo Workflows running a dynamic DAG.
  • Firestore is used as a caching layer. I started with MongoDB but moved to a GCP managed service once I started a true deployment.
  • ChromaDB is used as the vector store database, and I will have a separate article diving into the deployment architecture of that one.
  • LangChain is used to tie the LLM (OpenAI API in this case), database, and user query together. This provides a modular system and allows change to the system as things evolve.
  • Flask is used as a basic API for the Query API serving the LangChain requests.
  • (Bonus) — I used Streamlit as my POC web UI, but eventually moved to a custom Vue web app.

Final Words

That’s the conceptual overview for how the system was designed. Future articles will cover:

  • The GCP architecture and deployment of the system, seeing it in action.
  • How to cluster a Chromadb database, and splitting writers from readers.
  • Connecting Langchain to pull data from the vector store, and how to fine tune it.

I will also be releasing a public demo of this system in a short amount of time.

--

--