Analyze your code with a local GPT model using Langchain & Chroma

Create a local GPT assistant to analyze your Python code

6 min readJun 10, 2023

My local assistant Eunomia answering queries about a newly created Django project

In this article, I’ll show you how you can set up your own GPT assistant with access to your Python code so you can make queries about it. The code assistant will be able to list all classes and sub-classes, explain class hierarchies, recommend code, etc. The responses depend on how powerful the model selected is, how the data is processed and how specific your query is, so you will need to tune in a bit the parameters but don’t worry, I’ll tell you which parameters you can edit to accomplish a better response from the model. I’ll also list some resources at the end, that way you can improve the program yourself to fit your needs.

Setting up the environment

First we will need to create a new Python virtual environment. I’ll create it using the virtualenv package (You can use any package you like). Create a folder for the project and run this command inside it:

virtualenv AssistantEnv
AssistantEnv/Scripts/activate

In case you are using Linux, activate the virtual environment by running:

source AssistantEnv/bin/activate

Then we will need to install some libraries that will be needed. Download the requirements.txt file from my repository and run:

pip install -r /path/to/requirements.txt

This process will take some time since it will install various libraries, so sit back and relax.

Once the process is done you’ll need to download one of the available models in GPT4All and save it in a folder called LLM inside the program root directory. I’ll use groovy as example but you can use any one you like. Here’s a table with 3 models I have tested and their respective backends (You’ll need the correct backend to load each model).

+-----------------------------------+---------+-------+
|                LLM                | Backend | Size  |
+-----------------------------------+---------+-------+
| 🦙 ggml-gpt4all-l13b-snoozy.bin   |  llama  | 8.0GB |
| 🖼️ ggml-nous-gpt4-vicuna-13b.bin  |  llama  | 8.0GB |
| 🤖 ggml-gpt4all-j-v1.3-groovy.bin |  gptj   | 3.7GB |
+-----------------------------------+---------+-------+

IMPORTANT: The GPT model is loaded into memory when used, so be sure you have enough memory available for loading one of the heavy models. For example, vicuna weights 8GB, so 8GB will be used when the model is generating the response.

Creating the Vectorstore

Now that we have the environment set up and our model, we can start developing the program. Create a file called ingest.py in your root directory. We will use this file to ingest the files and create the vectorstore that our LLM will use as context for the queries. Here’s the code:

import os

from chromadb.config import Settings

from langchain.vectorstores import Chroma
from langchain.document_loaders import PythonLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter, Language

root_dir = os.getcwd()

docs = []
for dirpath, dirnames, filenames in os.walk(root_dir):
    for file in filenames:
        if file.endswith(".py") and "AssistantEnv" not in dirpath:
            try:
                loader = PythonLoader(os.path.join(dirpath, file))
                docs.extend(loader.load_and_split())
            except Exception as e:
                pass
print(f"{len(docs)} documents loaded.")
print("Creating vectorstore.")

text_splitter = RecursiveCharacterTextSplitter.from_language(
    chunk_size=60,
    chunk_overlap=2,
    language=Language.PYTHON,
)
texts = text_splitter.split_documents(docs)

Here are the steps of this code:

First we get the current working directory where the code you want to analyze is located.
Then, we search for any file that ends with .py and is not in the AssistantEnv folder.
Load the document using langchain’s PythonLoader and add them to the docs list.
We create a text_splitter object using the RecursiveTextSplitter.from_languageclassmethod that will handle the processing of the files in a format that will be easier for the LLM to process.
Lastly we use the split_documents method to process the docs and create a list with all the documents processed and ready to be added to the vectorstore.

Note: Here are the first two parameters that you can edit so you can optimize your LLM runtime and response (Test and edit them as you need). They are:

chuck_size: The maximum size of your chunks.
chunk_overlap: The maximum overlap between chunks. It can be nice to have some overlap to maintain some continuity between chunks.

Now let’s create the vectorstore using Chroma. Here’s the code:

CHROMA_SETTINGS = Settings(
    chroma_db_impl="duckdb+parquet",
    persist_directory="db",
    anonymized_telemetry=False,
)

embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

db = Chroma.from_documents(
    texts,
    embeddings,
    persist_directory="db",
    client_settings=CHROMA_SETTINGS,
)

db.persist()
db = None

print(f"Vectorstore created at {root_dir}\\db")

Save the file. Now lets create the file that will load the LLM and let you interact with it. Note: The first time you ingest the files, HuggingFaceEmbeddings will download the embeddings model, when run again it will use the same model and no file will be downloaded. Here’s a list with all the embeddings models you can use.

Loading and Interacting with the LLM

Create a new file called assistant.py. In this file we will handle the loading of the model, loading the vectorstore and passing queries to the LLM. Here’s the code:

import os

from chromadb import Settings
from langchain.llms import GPT4All
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import ConversationalRetrievalChain
from langchain.callbacks import StdOutCallbackHandler

CHROMA_SETTINGS = Settings(
    chroma_db_impl="duckdb+parquet",
    persist_directory="db",
    anonymized_telemetry=False,
)

embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

db_dir = os.getcwd() + "/db"

chroma = Chroma(
    persist_directory=db_dir,
    embedding_function=embeddings,
    client_settings=CHROMA_SETTINGS,
)

retriever = chroma.as_retriever(
    search_kwargs={
        "k": 6,
        "fetch_k": 20,
        "maximal_marginal_relevance": True,
    }
)

llm = GPT4All(
    model="assistant\absolute\path\LLM\ggml-gpt4all-j-v1.3-groovy.bin",
    n_ctx=1000,
    backend="gptj",
    verbose=True,
    callbacks=[StdOutCallbackHandler()],
    n_threads=8,  # Change this according to your cpu threads
    temp=0.5,
)

qa = ConversationalRetrievalChain.from_llm(llm, retriever=retriever)
chat_history = []

while True:
    query = input("\nEnter a query: ")
    if query in ["quit", "q"]:
        break

    response = qa({"question": query, "chat_history": chat_history})
    chat_history.append((query, response["answer"]))

Lets break down the code:

First we load the embeddings and initialize chroma. With this we create the retriever that will handle the retrieval of relevant information from your files.
Then we load the LLM with the following parameters:
— model: The absolute path to your model.
— n_ctx: Token context window.
— backend: The backend used for loading your model.
— verbose: Whether to print out response text.
— callbacks: allows you to hook into the various stages of your LLM application.
— n_threads: Number of threads to use.
— temp: How “creative” you want your model response to be. The lower the temp, the more conservative the response is.
Create a qaobject using the ConversationalRetrievalChain. This chain will make the model generate an answer based on the query and context provided. We also create a chat_history list that will hold in memory tuples with the query and the response it generated for it.
Finally, we create a while loop to interact with the model by passing the query and the chat_history. To break from the whileloop we just need to type qor quit.

Here are some parameters you can edit so you can optimize the program for your needs, they are n_ctx, backend and n_threads. It’s important that you search about the backend for the model that you are gonna use because there are various backends for different models. The n_threads are based on your cpu, you can erase it and they will be set up automatically by the program.

Now save the file and let’s test our program.

Testing the program:

First let’s move to the folder where the code you want to analyze is and ingest the files by running python path/to/ingest.py . If everything went correctly you should see a message that the vectorstore was created. Then run the assistant.py in the same directory and the model should load with the context provided. Here are some screenshots of what you should be seeing for both processes.

Running the assistant with a newly created Django project

Try asking the model some questions about the code, like the class hierarchy, what classes depend on X class, what technologies and languages are used, etc. The model might take some seconds to come up with the answer, this depends on your machine, if you fell like it’s too slow, try to use a smaller model and edit the model parameters. Also, as you can see, some answers are not what I was asking about, but also it inferred that my project was a Django application even though I never mentioned anything about it in my query. The answer the LLM gives you depends on how well you asked the question and how much information for answering the question it has, so experiment with it and try to find the best way to phrase your queries.

Hopefully this article helped you setting up your code assistant. Please visit Langchain’s documentation if you wanna try other chains, support more languages or use ChatGPT model instead of a local one.

Some resources:

Here are some resources you can read so you can customize your assistant:

Langchain documentation.
Chroma documentation.
Eunomia repository (my personal assistant).
privateGPT repository.
localGPT repository.
Langchain code understanding section.