VertexAI , How to make an AI-Assistant?

Published in

BosphorusISS

7 min readDec 28, 2023

Hi, in case you missed Part 1 or coming from there we were building apps with Generative-AI tools. In Part 1, we created a chatbot or let’s say question-answering model for our data. But how to make this model to assist you, help you to find what you are looking for in a big context.

The project is, creating a AI-Assistant for our Handbook to help new joiners on their onboarding process to the company. The Handbook is already created for that purpose, but wouldn’t it be better to find what you are looking for with the help of AI? Everything will be smoother, AI will find the related docs and summarize everything for you and will give you your answer.

In Part 1, we used OpenAI’s ChatGPT-4 as a LLM, but we will switch to Vertex-AI due to data privacy concerns. But beside LLM, nothing else has changed, Langchain is still with us. I want to start with a brief overview of Vertex AI and GCP’s gen-ai approach. Also, I will talk about the differences between OpenAI’s, Google’s, Amazon’s and Microsft’s approaches to this topic. Lets start from here.

Interaction with LLM’s

OpenAI, as you can guess, allows direct use, but AWS and Google follow different aprroach, they serve LLM’s via API or Gateway as a service.

Google uses its own models that you can find on Model Garden, AWS builds their Titan and also has Hugging-face and Anthropic’s models. Microsoft uses OpenAI’s models as a service. I don’t know if there are any differences in models between OpenAI and Azure OpenAI Service.

Generative AI on VertexAI

Google’s VertexAI Service offers access to a variety of foundation models for testing, tuning and deploying them to use in AI-powered applications. The Generative AI Studio, a user interface within VertexAI, enables these tasks without any coding requirements. You can test prompts, change your model from one to another to make comparison.

Foundation models accessible via API’s as I mentioned earlier.

Gemini API: Advanced reasoning, multiturn chat, code generation, and multimodal prompts.
PaLM API: Natural language tasks, text embeddings, and multiturn chat.
Codey APIs: Code generation, code completion, and code chat.
Imagen API: Image generation, image editing, and visual captioning.
MedLM: Medical question answering and summarization. (private)

You can store your dataset in a bucket and tune your model using Supervised or RLHF methods. After the tuning pipeline completes, evaluation tools help assess the tuned model’s performance. Once its ready for production, you can deploy it to an endpoint and monitor your performance just like DevOps. It is calles MLOps.

On VertexAI Search and Conversation, you can simplify RAG approach for your data. It allows you to ground your data to reduce hallucinations. Though still in preview, but you can give it a shot. Docs are here.

Using PalmAPI with Langchain

To use Vertex AI PaLM you must have the google-cloud-aiplatform Python package installed or configure credentials for your environment named GOOGLE_APPLICATION_CREDENTIALS. Setup instructions through the Google Cloud CLI are provided here.

Let’s continue with the project.

I started to search for a way to build this ai-assistant; RAG was the perfect architecture for this. I want to start with a small briefing about RAG, before that, here is the langchain’s python docs for RAG.

Retrieval-Augmented Generation (RAG)

RAG is a technique for augmenting LLM knowledge with additional data. Is the perfect solution if you have private or real-time data.

When I first started to learn about this, I was experimenting with OpenAI’s Chatgpt-3.5 and for embedding model text-embedding-ada-002. After that, we switch to Google, I started to use text-bison-001 as a model and for embeddings, textembedding-gecko. I have to say the switch was quite smooth, thanks to Langchain. But I hadn’t experimented enough on ChatGPT to make a comparison between OpenAI and Google.

RAG Architecture

There are 2 main compenents of RAG. The first one is Indexing and the second one is Retrival and Generation.

Indexing: a pipeline for ingesting data from a source and indexing it. The flow goes like this : load the data -> split it -> store it.

Retrieval and generation: the actual RAG chain, takes the user query and retrieves the relevant data from the index (source) , then passes that to the model to generate a answer.

Steps

Load: First, we need to load our data. Our data source is a website. We’ll use RecursiveUrlLoader for this. You can use BeautifulSoup to exterct data on a structural way. After that we will have ourDocuments
Split: Splitting means breaking large Documents into smaller chunks. This is useful both for indexing data and for passing it into a model. We will use RecursiveCharacterTextSplitter.
Store: We need somewhere to store and index our splited documents so that they can later be searched over. This is often done using a Vector Store and an embbeding model. We will use VertexAI’s embedding model textembedding-gecko and store this embedding in Postgres via PG-Vector tool. But on this post I will use Chroma as a vector db.
Retrieve: Given a user input, relevant splits are retrieved from storage using a retriver. Since we want to keep the conversation in memory, we will use ConversationalRetrievalChain.
Generate: A LLM generates an answer using a prompt that includes the question and the retrieved data.

What is Vector Database, even before what is a Vector?

A vector is basically an array of numbers that represent more complex objects like words, sentences, images, audio files in an embedding. In an embedding similar objects group together. To store this we need a vector database.

In a Vector Database you have arrays of numbers clustered together based on the similarity, and they can be queried to find similartiy really fast. On AI-powered applications, vector stores fit perfectly for searching over data. We will embed the document-splits and store the resulting embedding inside a vector store. At query time, we will embed the query and retrieve the embedding vectors that are ‘most similar’ to the embedded query.

Code

Please read the comments to follow through RAG steps.

# Step 1 : load

from langchain.document_loaders.recursive_url_loader import RecursiveUrlLoader
from bs4 import BeautifulSoup as Soup

url = "https://handbook.bosphorusiss.com/"

# use extractor to extract documents with tools like beautifulsoup or goose3
loader = RecursiveUrlLoader(
    url=url, max_depth=5 , extractor=lambda x: Soup(x, "html.parser").text
)
docs = loader.load()

# Step 2: split documents

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
splits = text_splitter.split_documents(docs)

#  Step 3 : Embed and Store Splits

from langchain.vectorstores import Chroma
from langchain.embeddings  import VertexAIEmbeddings

vectorstore = Chroma.from_documents(documents=splits, 
                                    embedding = VertexAIEmbeddings())

#  Step 4: Retrive 
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationSummaryMemory
from langchain.llms import VertexAI

retriever = vectorstore.as_retriever()

llm = VertexAI(model_name="text-bison@001")

memory = ConversationSummaryMemory(
    llm=llm, memory_key="chat_history", return_messages=True
)

#  Step 5: Generate
assistant = ConversationalRetrievalChain.from_llm(llm=llm, 
                                           retriever=retriever, 
                                           memory=memory)

question = "What can I do on my first day at BISS?"
result = assistant(question)
result["answer"]

Later on we create a API for this and play with prompts , here is some of the answers from our assistant :


"{""data"": {""type"": ""human"", ""content"": ""What is the purpose of this AI?"", ""example"": false, ""additional_kwargs"": {}}, ""type"": ""human""}"
"{""data"": {""type"": ""ai"", ""content"": ""The purpose of this AI is to help people find out what to do or how to do things at BISS."", ""example"": false, ""additional_kwargs"": {}}, ""type"": ""ai""}"

"{""data"": {""type"": ""human"", ""content"": ""What kind of information can I find in the handbook?"", ""example"": false, ""additional_kwargs"": {}}, ""type"": ""human""}"
"{""data"": {""type"": ""ai"", ""content"": ""You can find general information, guidelines, changelog template and books in the handbook."", ""example"": false, ""additional_kwargs"": {}}, ""type"": ""ai""}"

"{""data"": {""type"": ""human"", ""content"": ""Can you provide me with an overview of the company's policies and procedures?"", ""example"": false, ""additional_kwargs"": {}}, ""type"": ""human""}"
"{""data"": {""type"": ""ai"", ""content"": ""The company's policies and procedures are outlined in the BISS Handbook. The Handbook is a living document that grows and evolves based on how we spend our days in company. It contains information on everything from how to git to where the coffee mugs in the office are. If you have any questions about something related to the company, you can find the answer in the Handbook."", ""example"": false, ""additional_kwargs"": {}}, ""type"": ""ai""}"

"{""data"": {""type"": ""human"", ""content"": ""What are the home office rules"", ""example"": false, ""additional_kwargs"": {}}, ""type"": ""human""}"
"{""data"": {""type"": ""ai"", ""content"": ""The home office rules are as follows:\n\n- You must have a registered/confirmed home address on BISS ERP.\n- Click the My Home Office Form button on the Home Office page on the working day before the day you work from home, select your approved address to make your home office and click the create button. That is all!\n- BISS expects you to be reachable by your team between 10:00 and 16:00, you can use flexible working hours for all remaining hours.\n- Do not forget that if you need to leave your registered address for a while during working hours, you must inform your teammates and BISS."", ""example"": false, ""additional_kwargs"": {}}, ""type"": ""ai""}"

We progressed quickly through this project because of our base knowledge.

Here are some really useful docs:

For the ones that don’t prefer langchain here is the Palm API docs.

After this, we will try to train our assistant to answer exactly the way we wanted to, yes, you are right, fine-tuning on the next one. See you there :)