Competitor Analytics with Langchain Agents and Vertex Palm API
“The best thing about Being Me …there’s so many ‘Me’s’ “is one of Agent Smith’s popular phrase from the Matrix Trilogy.
The modern day Agents, do not necessary have to so many “me’s”. A single agent, can get access to data from multiple sources and still deliver the impact :)
With the ability to combine multiple data sources, AI Agents, can give wide ranging inputs for the end user. Comparing multiple products and features can be a potential use case for this.
Langchain Agents, allows to combine multiple sources though its ability to combine their multiple vector stores. This allows us to have a single master agent which can access the right data source with the LLM capability and return the relevant response.
Agents per se use LLM to determine a sequence of actions to take. They use the LLM to reason the actions and in which order they need to be taken. While there are multiple Agent types, we will explore the ReAct type of Agent, which looks at synergizing Reasoning and Acting in Large Language Models.
We will understand how a single Agent can answer by accessing multiple sources by taking publicly available data sources of BigQuery,
Web Source : BigQuery pricing sheet ,https://cloud.google.com/bigquery/pricing
Document Source : https://github.com/tpn/pdfs/blob/master/BigQuery%20Technical%20Whitepaper%20-%20Google.pdf, A whitepapaer on BiGQuery.
Social media : https://www.youtube.com/watch?v=mgXTtO5loYY&t=566s Video on latest announcements in BigQuery
Let us load all our libraries,
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain.document_loaders import YoutubeLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import WebBaseLoader
from langchain.agents import initialize_agent, Tool
from langchain.agents import load_tools
from langchain.agents import AgentType
from langchain.tools import BaseTool
from langchain import LLMMathChain, SerpAPIWrapper
Import the vertex AI text-bison model,
import vertexai
from vertexai.language_models import TextGenerationModel
from langchain.llms import VertexAI
llm = VertexAI(
model_name='text-bison@001',
max_output_tokens=256,
temperature=0.1,
top_p=0.8,
top_k=40,
verbose=True,
)
Load the internal pdf data to a chroma db vector store,
import pypdf
import requests
import json
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("/content/BQ_tech.pdf")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
texts = text_splitter.split_documents(documents)
docsearch = Chroma.from_documents(texts, embeddings, collection_name="bq-tech")
bqtech = RetrievalQA.from_chain_type(
llm=llm, chain_type="stuff", retriever=docsearch.as_retriever()
)
Load the external web urls to a chroma db vector store,
loader = WebBaseLoader("https://cloud.google.com/bigquery/pricing")
docs = loader.load()
bqpricing_texts = text_splitter.split_documents(docs)
bq_pricing_db = Chroma.from_documents(bqpricing_texts, embeddings, collection_name="bq-pricing")
bq_pricing = RetrievalQA.from_chain_type(
llm=llm, chain_type="stuff", retriever=bq_pricing_db.as_retriever()
)
Load the Youtube vidoes to a chroma db vector store,
loader = YoutubeLoader.from_youtube_url("https://www.youtube.com/watch?v=mgXTtO5loYY&t=566s", add_video_info=True)
result = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=0)
bq_latest_docs = text_splitter.split_documents(result)
youtube_db = Chroma.from_documents(bq_latest_docs, embeddings,collection_name="bq_latest")
bq_latest = RetrievalQA.from_chain_type( llm=llm, chain_type="stuff", retriever=youtube_db.as_retriever())
We will now combine all the 3 sources via tools,
tools = [
Tool(
name="BigQuery tech",
func=bqtech.run,
description="useful for when you need to answer technical questions about BigQuery .",
),
Tool(
name="Bigquery pricing system",
func=bq_pricing.run,
description="useful for when you need to answer questions about Bigquery pricing.",
),
Tool(
name="Bigquery latest",
func=bq_latest.run,
description="useful for when you need to answer questions about the latest announcements in Bigquery.",
),
]
Let us intializiae the Agent with Zero_Shot_React type,
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
We are all set now, let us now ask multiple questions to our agent in a single pass,
agent.run("What is Biquery? What is on-demand pricing in BigQuery? Who are the latest customers of BigQuery?")
The Agent is able to break down each question, think and reason for the right source and retrieve the answer from the right source automatically,
Agents which combine data from multiple sources along with other tools such as serp_api, llm_math can be deployed for giving multi-dimensional inputs for a given product, heralding Competitor Analytics X.0 :)