Integrating AI with LangChain and MistralAI API

Published in

CodeX

5 min readApr 4, 2024

In the rapidly changing landscape of AI-powered customer support, combining advanced language models and document processing capabilities is critical for providing prompt and context-aware assistance. We will see a Python script that uses the LangChain library and MistralAI to produce AI scripts based on user inquiries and support documents.

Importing Necessary Libraries

from langchain_mistralai import MistralAIEmbeddings
from langchain_core.messages import HumanMessage
from langchain_mistralai.chat_models import ChatMistralAI
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory
from langchain.schema import Document
from ipywidgets import Layout
import ipywidgets as widgets
from IPython.display import display, clear_output, HTML
import os
from langchain.vectorstores import Chroma
from langchain.chains.question_answering import load_qa_chain

This section imports the script’s necessary modules. LangChain library components are used to manage messages, store conversation data, and interface with language models. IPyWidgets and IPython Display are used to create interactive user interface elements in Jupyter notebooks, and os is used to access environment variables. LangChain’s Chroma module is for vector storage and document processing, which is required for embedding and similarity search functionality.

Setting Up Configuration and Constants

# Define constants
CONTEXT = "context"
HUMAN_INPUT = "human_input"
CHAT_HISTORY_INDICATOR = "chat_history_indicator"
MISTRAL_CHAT_MODEL = "mistral-small"
TOP_DOC_NUM = 3
SUPPORT_DOC_PATH = None

MISTRAL_API_KEY = os.environ["MISTRAL_API_KEY"]  # Update with your MISTRAL_API_KEY

# Use the home directory as the base path for your writable directory
home_directory = "/home/ec2-user/<path>/customer_support_bot"  # Replace the path with your files

# Specify a subdirectory within the home directory
SUPPORT_DOC_FOLDER_PATH = os.path.join(home_directory, 'support_docs')
script_chain = None

The configuration section specifies constants and environmental settings for the script’s operation, such as API keys, model selection, and document locations. This configuration is required for configuring the environment and defining operational factors such as the location of support document.

Generating AI Scripts with LangChain and MistralAI

def return_generate_ai_script(template_text: str, user_query: str) -> str:

    """
    Generate an AI script and append it to a template.

    Args:
        template_text (str): Template for LLM model.
        message_text (str): User Question.
        retrieved_docs (list): List of retrieved documents.

    Returns:
        str: AI output string.
    """

    global script_chain

    # Load an embedding model from MistralAI
    embeddings = MistralAIEmbeddings(mistral_api_key=MISTRAL_API_KEY)

    all_docs = list(process_doc_files(SUPPORT_DOC_FOLDER_PATH).values())

    # Transform my docs into vectors and store that into a database (chroma) for managing and querying the embeddings
    docsearch = Chroma.from_texts(all_docs, embeddings)

    # Use cosign similarity to perform the search for documents similar to the user query
    similar_docs = docsearch.similarity_search(user_query, k = 1)

    # Initialize script_chain if it doesn't exist
    if script_chain is None:

        # Input for the prompt
        prompt = PromptTemplate(input_variables=[CHAT_HISTORY_INDICATOR, HUMAN_INPUT, CONTEXT], template=template_text)

        # Input for the Memory class
        memory = ConversationBufferMemory(memory_key=CHAT_HISTORY_INDICATOR, input_key = HUMAN_INPUT )

        # Load LLM model
        llm = ChatMistralAI(model_name=MISTRAL_CHAT_MODEL, temperature=0, mistral_api_key=MISTRAL_API_KEY)

        # Feed LLM model, memory object, and prompt to the Q and A chain function
        script_chain = load_qa_chain(llm = llm, chain_type="stuff", memory= memory, prompt=prompt)

    gen_ai_output = script_chain({"input_documents": similar_docs, HUMAN_INPUT: user_query}, return_only_outputs=True)

    print('Chain memory: ', script_chain.memory.buffer)

    return gen_ai_output['output_text']

The return_generate_ai_script function generates an AI-generated script from a template and a user query, using language model embeddings and document similarity searches.

The function’s goal is to dynamically generate text appropriate to a user’s query using pre-existing documents as resources and a language model.

Loading Embeddings: MistralAIEmbeddings is created using an API key to load an embedding model. This model can translate text into numerical vectors that represent semantic meanings.

Document Processing: The process_doc_files function processes all support documents in SUPPORT_DOC_FOLDER_PATH, returning a dictionary of document contents.

Vectorization and Document Search: The function Chroma.from_texts converts these document contents into vector representations using the previously loaded embeddings. It then uses these vectors to conduct a similarity search for documents that are most similar to the user’s query (similar_docs), with the goal of locating the top document (k = 1) that matches the query.

Script Chain Initialization: If script_chain is not already initialized, the function creates a PromptTemplate with the supplied input variables and template content. A ConversationBufferMemory object is created to manage the conversation context and memory.

The ChatMistralAI model is configured with certain parameters, such as the model name and temperature, for response creation. These components are then linked to form a question-and-answer (QA) chain that may generate responses based on the language model, conversation memory, and input prompt.

Generating AI output: The function then provides the related documents and the user’s query to the script_chain, which produces an AI result. This output has been optimized to be contextually appropriate to the user’s inquiry, drawing on the most similar support document(s).

The return_only_outputs=True argument specifies that only the generated text (output) should be returned, ignoring any further metadata or intermediate outcomes.

Finally, the function produces the created AI output text, which is a line of code inserted to the specified template and customized for the user’s query.

def process_doc_files(SUPPORT_DOC_FOLDER_PATH) -> dict:
    """
    Process text files in a folder and return a dictionary with file names as keys and content as values.

    :param folder_path: The path to the folder containing the text files.
    :return: A dictionary with file names as keys and content as values.
    """
    # Initialize an empty dictionary to store the results
    doc_dict = {}

    # Check if the folder exists
    if not os.path.exists(SUPPORT_DOC_FOLDER_PATH):
        return doc_dict  # Return an empty dictionary if the folder does not exist

    # List all files in the folder
    file_list = os.listdir(SUPPORT_DOC_FOLDER_PATH)

    # Iterate through the files
    for filename in file_list:
        # Check if the file has a .txt extension
        if filename.endswith(".txt"):
            # Create the full path to the file
            file_path = os.path.join(SUPPORT_DOC_FOLDER_PATH, filename)

            # Open the file and read its content
            with open(file_path, 'r', encoding='utf-8') as file:
                file_content = file.read()

            # Store the content in the dictionary with the filename as the key
            doc_dict[filename] = file_content

    return doc_dict

The process_doc_files function is intended to process and read text files from a specified directory, resulting in a dictionary with keys representing the file names and values representing their contents. This function is especially useful in applications that need to dynamically load and use text data from files, such as document processing systems, content management systems, or AI models that require access to a text corpus.

Create an empty dictionary: doc_dict = {} creates an empty dictionary for storing file processing results. Each entry in this dictionary will eventually associate a filename with its content.

Check Folder Existence: The statement if not os.path.exists(SUPPORT_DOC_FOLDER_PATH) determines whether the supplied folder path exists. If it does not, the function produces an empty dictionary, thereby handling the situation in which the specified path is invalid or the folder is absent.

List all the files in the folder: file_list = os.listdir(SUPPORT_DOC_FOLDER_PATH) returns a list of all files in the provided directory. This list will be used to iterate through every file in the directory.

Iterate through files: The function uses a for loop to cycle over each file in file_list. It then determines whether the file ends with a .txt extension.

Read File Content: For each text file, the method creates a full path (file_path) and opens it in read mode. The contents of the file are read and saved in file_content.

Store content in the dictionary: In the doc_dict dictionary, the filename serves as the key and the file’s content as the value. This efficiently associates each text file’s name with its content.

Return the dictionary: After processing all text files in the directory, the method returns the doc_dict dictionary, which contains the mappings between filenames and their contents.

Integrating AI with LangChain and MistralAI API

Importing Necessary Libraries

Setting Up Configuration and Constants

Generating AI Scripts with LangChain and MistralAI

Written by Harish R