Langchain Tooling using Groq

Published in

The AI Forum

6 min readMay 19, 2024

Introduction

An increasing number of LLM providers are offering APIs for dependable tool usage. The purpose of the new tool calling attribute of Langchain is to establish a standardized interface for engaging with tool invocations. This feature maintains full backward compatibility and is available on all models that support native tool-calling capabilities.

What is Tool Calling?

Tool calling enables a model to generate responses to a prompt according to a specific user-defined format or schema.

Despite the term suggesting that the model executes an action, it does not actually perform any operations. Instead, the model produces the necessary parameters for a tool, leaving the decision to run the tool (or not) up to the user.

For instance, if you need to extract structured information from unstructured text, you could provide the model with an “extraction” tool that requires parameters fitting the desired schema. The output generated by the model based on this schema can then be used as the final result.

List of Chat Models Supporting Tool Calling

Many providers of large language models (LLMs), such as Anthropic, Cohere, Google, Mistral, OpenAI, and others, offer versions of a tool-calling feature. This feature usually allows requests to be sent to the LLM to include details about the available tools and their schemas, and responses from the LLM to include calls to these tools.

For example, with a search engine tool, an LLM might process a query by first making a call to the search engine. The system that interacts with the LLM can then execute this tool call and return the results to the LLM to enhance its response.

LangChain offers a variety of built-in tools and supports various ways to define custom tools. Tool-calling is highly beneficial for creating tool-using chains and agents, and for obtaining structured outputs from models in general.

Tool Calling Agent

Tool calling enables a model to recognize when one or more tools should be invoked and generate the necessary input for those tools. When using an API call, you can define tools and let the model intelligently choose to produce a structured object, like JSON, with the required arguments for these tools. The aim of tool APIs is to deliver valid and useful tool calls more reliably than using a standard text completion or chat API.

By leveraging this structured output and the capability to link multiple tools to a tool-calling chat model, the model can decide which tool to use. This setup allows the creation of an agent that continuously calls tools and processes their results until the query is resolved.

This approach generalizes the OpenAI tools agent, originally designed for OpenAI’s specific tool-calling method. It employs LangChain’s ToolCall interface to support a broader range of provider implementations, including Anthropic, Google Gemini, and Mistral, in addition to OpenAI.

LangChain implements standard interfaces for defining tools, passing them to LLMs, and representing tool calls. The standard interface consists of:

ChatModel.bind_tools(): a method for attaching tool definitions to model calls.
AIMessage.tool_calls: an attribute on the AIMessage returned from the model for easily accessing the tool calls the model decided to make.
create_tool_calling_agent(): an agent constructor that works with ANY model that implements bind_tools and returns tool_calls

Code Implementation

Install required dependencies

!pip install -qU langgraph langchain langchain_openai langchain_experimental langchain-groq sentence-transformers langchain-core langchain-mistralai
!pip install -qU --disable-pip-version-check qdrant-client pymupdf tiktoken

Set up required API Keys

import os
from google.colab import userdata

os.environ["TAVILY_API_KEY"] = userdata.get("TAVILY_API_KEY")
os.environ["GROQ_API_KEY"] = userdata.get("GROQ_API_KEY")

Instantiate Embedding Model

from langchain_community.embeddings import HuggingFaceEmbeddings
EMBEDDING_MODEL_NAME = "thenlper/gte-small"
embedding_model = HuggingFaceEmbeddings(
    model_name=EMBEDDING_MODEL_NAME,
    multi_process=True,
    model_kwargs={"device": "cpu"},
    encode_kwargs={"normalize_embeddings": True},  # Set `True` for cosine similarity
)

Instantiate the LLM

from langchain_groq import ChatGroq
llm = ChatGroq(temperature=0, model_name="mixtral-8x7b-32768")

Download chunk data

from langchain.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

docs = PyMuPDFLoader("https://arxiv.org/pdf/2404.19553").load()
#
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 300,
    chunk_overlap = 0,
    length_function = len,
)

split_chunks = text_splitter.split_documents(docs)

Instantiate the Vectorstore

from langchain_community.vectorstores import Qdrant

qdrant_vectorstore = Qdrant.from_documents(
    split_chunks,
    embedding_model,
    location=":memory:",
    collection_name="extending_context_window_llama_3",
)

Setup the retriever

qdrant_retriever = qdrant_vectorstore.as_retriever()

Build a simple LCEL RAG

from langchain_core.prompts import ChatPromptTemplate
from operator import itemgetter
from langchain.schema.output_parser import StrOutputParser

RAG_PROMPT = """
CONTEXT:
{context}

QUERY:
{question}

You are a helpful assistant. Use the available context to answer the question. If you can't answer the question, say you don't know.
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_PROMPT)

#

rag_chain = (
    {"context": itemgetter("question") | qdrant_retriever, "question": itemgetter("question")}
    | rag_prompt | openai_chat_model | StrOutputParser()
)
#
rag_chain.invoke({"question" : "What does the 'context' in 'long context' refer to?"})

######################Response##########################
The 'context' in 'long context' refers to a large amount of continuous text that needs to be processed or understood. In the given documents, it can refer to a coherent text such as a book or a long paper. It can also refer to the input data given to a language model for it to generate responses.

Defining tool schemas: LangChain Tool

from langchain_community.tools.tavily_search import TavilySearchResults
from typing import Annotated, List, Tuple, Union
from langchain_core.tools import tool

tavily_tool = TavilySearchResults(max_results=5)

#
@tool
def retrieve_information(
    query: Annotated[str, "query to ask the retrieve information tool"]
    ):
  """Use Retrieval Augmented Generation to retrieve information about the 'Extending Llama-3’s Context Ten-Fold Overnight' paper."""
  return rag_chain.invoke({"question" : query})

Create Search Agent

prompt = ChatPromptTemplate.from_messages([("system","You are a helpful Search Assistant"),
                                           ("human","{input}"),
                                           ("placeholder","{agent_scratchpad}")])

tools = [tavily_tool]
llm = ChatGroq(temperature=0, model_name="mixtral-8x7b-32768")
#
search_agent = create_tool_calling_agent(llm,tools,prompt)
#

Run the Search Agent

search_agent_executor = AgentExecutor(agent=search_agent, tools=tools)
search_agent_executor.invoke({"input":"What are the main takeaways from the paper `Extending Llama-3's Context Ten-Fold Overnight'? Please use Search and PaperInformationRetriever!"})

###### Response #################
{'input': "What are the main takeaways from the paper `Extending Llama-3's Context Ten-Fold Overnight'? Please use Search and PaperInformationRetriever!",
 'output': 'Based on the information provided by the tool, the main takeaways from the paper "Extending Llama-3\'s Context Ten-Fold Overnight" are:\n\n1. The context length of Llama-3-8B-Instruct has been extended from 8K to 80K using QLoRA fine-tuning.\n2. The entire training cycle is efficient, taking only 8 hours on one 8xA800 (80G) GPU machine.\n3. The extended model exhibits superior performance across various evaluation tasks, such as NIHS, topic retrieval, and long-context language understanding.\n\nThese improvements demonstrate a significant advancement in handling long contexts for large language models.'}

Create the Research Agent

prompt1 = ChatPromptTemplate.from_messages([("system","You are a helpful Research Assistant who can provide specific information on the provided paper."),
                                           ("human","{input}"),
                                           ("placeholder","{agent_scratchpad}")])
tools1 = [retrieve_information]
#
research_agent = create_tool_calling_agent(llm,tools1,prompt1)
research_agent_executor = AgentExecutor(agent=reearch_agent, tools=tools1)
#
research_agent_executor.invoke({"input":"What are the main takeaways from the paper `Extending Llama-3's Context Ten-Fold Overnight'? Please use Search and PaperInformationRetriever!"})

#### RESPSONSE ##############################
{'input': "What are the main takeaways from the paper `Extending Llama-3's Context Ten-Fold Overnight'? Please use Search and PaperInformationRetriever!",
 'output': 'Based on the information provided by the tool, the main takeaways from the paper "Extending Llama-3\'s Context Ten-Fold Overnight" are as follows:\n\n1. The authors have developed a method to extend the context length of the Llama-3 model from 8K to 80K, which they call Llama-3-8B-Instruct-80K-QLoRA.\n2. An efficient solution for entitling the long-context capabilities for Large Language Models (LLMs) was proposed, enabling the extension.\n3. The new model\'s performance is still competitive, but it was observed that context extension may compromise the model’s short-context capability, which is consistent with previous research findings.\n4. Users can apply the model for even longer contexts via extrapolation.\n5. The model is named Llama-3-8B-Instruct-80K-QLoRA based on its max context length during fine-tuning.\n\nThese takeaways provide a summary of the main contributions and observations from the paper. However, for a more comprehensive understanding, it is recommended to read the full paper.'}

Conclusion

Here we have built a tool calling agent using langchain groq. This standardized tool calling interface can help save LangChain users time and effort and allow them to switch between different LLM providers more easily.

References

https://blog.langchain.dev/tool-calling-with-langchain/