Build RAG Application Using a LLM Running on Local Computer with GPT4All and Langchain

Privacy-preserving LLM without GPU

(λx.x)eranga
Effectz.AI
21 min readMar 10, 2024

--

RAG(Retrieval Augmented Generation)

Retrieval-Augmented Generation (RAG) enhances the performance of Large Language Models (LLMs) by incorporating external, authoritative knowledge bases into the response generation process. LLMs, known for their vast training datasets and billions of parameters, excel in tasks such as question answering, language translation, and sentence completion. RAG extends these capabilities into specific domains or an organization’s internal knowledge, enabling the model to access up-to-date and relevant information without the need for retraining. This method represents a cost-effective strategy to enhance the relevance, accuracy, and applicability of LLM outputs across various contexts. The popularity of RAG has surged alongside the advancement of LLMs, particularly with models like OpenAI’s GPT series. The following diagram provides a high-level overview of the RAG process.

Retrieval - Upon receiving a query or prompt, RAG initiates a search across a vast text corpus, like documented materials or domain-specific datasets, to locate pertinent documents or passages. This phase employs a retrieval mechanism, often grounded in vector similarity search, where the prompt and documents are transformed into vectors in a high-dimensional space for comparison and relevance assessment.

Augmentation - Next, the RAG model augments the user input (or prompts) by adding the relevant retrieved data in context. This step uses prompt engineering techniques to communicate effectively with the LLM. The augmented prompt allows the large language models to generate an accurate answer to user queries. This enriched context(user query + retrieved information) will be sent to the LLM.

Generation - The final step involves generating a response based on the combined input of the original query and the information from the retrieved documents. This is typically done using a Transformer-based model(LLM), which can consider the vast amount of information from both the query and the documents to construct its response.

RAGs with Commercial LLMs

Numerous LLMs are available for creating RAG applications, with OpenAI’s GPT series being among the most prominent. In previous discussions, I’ve explored how to develop RAG applications using OpenAI’s GPT models in conjunction with Langchain.

  1. Creating Custom ChatGPT with Your Own Dataset using OpenAI GPT-3.5 Model, LlamaIndex, and Langchain.
  2. Session-based Custom ChatGPT Model for Website Content Utilizing OpenAI GPT-4 LLM, Langchain ConversationalRetrievalChain, and MongoDB Conversational Memory.
  3. Session-based Custom ChatGPT Model for Website Content Utilizing OpenAI GPT-4 LLM, Langchain LLMChain, and MongoDB Conversational Memory.

However, deploying these commercial LLMs for RAG presents several challenges. A primary concern is privacy; utilizing services like OpenAI’s GPT requires submitting our data (e.g., vector search results) for processing, potentially exposing sensitive information. This privacy issue has made many individuals and organizations hesitant to adopt commercial LLMs. Additionally, there is the cost factor, as these services typically charge based on the length of the queries and responses, which can add up quickly.

To mitigate these concerns, open-source LLMs such as Meta’s Llama-2 have been developed. Despite their advantages, running these models typically demands significant GPU resources, which may not be feasible for all users or organizations.

GPT4All offers a solution to these dilemmas by enabling the local or on-premises deployment of LLMs without the need for GPU computing power. This approach not only addresses privacy and cost concerns but also makes it easier for a broader range of users to leverage the power of LLMs in their RAG applications, ensuring more secure and accessible AI capabilities.

GPT4ALL

GPT4All built Nomic AI is an innovative ecosystem designed to run customized LLMs on consumer-grade CPUs and GPUs. Traditionally, LLMs are substantial in size, requiring powerful GPUs for operation. However, GPT4All leverages neural network quantization, a technique that significantly reduces the hardware requirements, enabling LLMs to run efficiently on everyday computers without needing an internet connection.

Quantization addresses the challenge posed by the massive size of LLMs. By lowering the precision of the models’ weights, it’s possible to conserve memory and accelerate inference times while maintaining the majority of the model’s accuracy. Recent advancements in 8-bit and 4-bit quantization have further enhanced the feasibility of operating LLMs on standard consumer hardware.

There are different model quantization techniques, such as NF4, GPTQ, and GGML. GPT-4All utilizes models that have been quantized using the GGML technique. Users can download GPT4All model files, ranging from 3GB to 8GB, and integrate them into the GPT4All open-source ecosystem software. Nomic AI upholds this ecosystem, ensuring quality, security, and making it simpler for anyone or any organization to train and deploy their own on-edge LLMs.

GPT4All supports multiple model architectures that have been quantized with GGML, including GPT-J, Llama, MPT, Replit, Falcon, and StarCode. A significant aspect of these models is their licensing conditions. For instance, Llama-based models are under a non-commercial license, while GPT-J and MPT models permit commercial use. Notably, Llama-2 models are also available for commercial licensing. While Llama models were initially viewed as superior in performance, the landscape is rapidly evolving. New models are being released frequently, with some GPT-J and MPT models matching or even surpassing Llama in performance and quality. Additionally, MPT models introduce innovative architectural improvements that could lead to further enhancements in performance and quality.

RAG Application

In this post, I will explore how to develop a RAG application by running a LLM locally on your machine using GPT4All. The integration of these LLMs is facilitated through Langchain.

This RAG application incorporates a custom-made dataset, which is dynamically scraped from an online website. Users can interact with the website’s data through the an API(e.g REST API). For demonstration purposes, I’ve selected the Open5GS documentation website (Open5GS is a C-language implementation of the 5G Core). The data from the Open5GS documentation is scrap, split, and then stored in the Chroma vector database as vector embeddings. Consequently, users can seamlessly interact with the content of the Open5GS documentation via the API.

For the LLM component of this RAG application, I’ve opted for the nous-hermes-llama2–13b.Q4_0.gguf model, which is available through GPT4All. There is a range of GPT4All-based LLMs suitable for this application, all of which can be found on the GPT4All website. The nous-hermes-llama2–13b.Q4_0.gguf is Meta’s Llama-2 based LLM, quantized for optimal performance on consumer-grade hardware, such as CPUs. In this RAG application, the nous-hermes-llama2–13b.Q4_0.gguf LLM provides answers to user questions based on the content in the Open5GS documentation.

Following are the main functionalities of the RAG application. A comprehensive functional architecture, encompassing these various components, is detailed in the figure below.

1. Scrape Web Data

Langchain provide different types of document loaders to load data from different source as Document's. RecursiveUrlLoader is one such document loader that can be used to load the data in web url into documents. This step employs Langchain’s RecursiveUrlLoader to scrape data from the web as documents. RecursiveUrlLoader scrapes the given url recursively in to given max_depth and read the data on the web. This data used to create vector embedding and answer questions of the user.

2. Split Documents

When handling lengthy pieces of text, it’s essential to divide the text into smaller segments. Although this task seems straightforward, it can encompass considerable complexity. The goal is to ensure that semantically related segments of text remain together. The Langchain text splitter accomplishes this task effectively. Essentially, it divides the text into small, semantically meaningful units (often sentences). These smaller segments are then combined to form larger chunks until they reach a certain size, determined by a specific function. Upon reaching this size, the chunk is designated as an individual piece of text, and the process begins anew with some overlap. For this particular scenario, I have employed the RecursiveCharacterTextSplitter to split the scraped documents into manageable chunks.

3. Create Vector Embedding

Once the data is collected and split, the next step involves converting this textual information into vector embeddings. These embeddings are then created from the split data. Text embeddings are crucial to the functioning of LLM operations. While it’s technically feasible to work with language models using natural language, storing and retrieving such data is highly inefficient. To enhance efficiency, it’s necessary to transform text data into vector form. There are dedicated machine learning models specifically designed for creating embeddings from text. In this case, I have utilized open-souce HuggingFaceEmbedding model all-MiniLM-L6-v2 to generate vector embeddings. The text is thereby converted into multidimensional vectors, which are essentially high-dimensional numerical representations capturing semantic meanings and contextual nuances. Once embedded, these data can be grouped, sorted, searched, and more. We can calculate the distance between two sentences to determine their degree of relatedness. Importantly, these operations transcend traditional database searches that rely on keywords, capturing instead the semantic closeness between sentences.

4. Store Vector Embedding in Chroma

The generated vector embeddings are then stored in the Chroma vector database. Chroma(commonly referred to as ChromaDB) is an open-source embedding database that makes it easy to build LLM apps by storing and retrieving embeddings and their metadata, as well as documents and queries. Chroma efficiently handles these embeddings, allowing for quick retrieval and comparison of text-based data. Traditional databases work well for exact queries but fall short when it comes to understanding the nuances of human language. Enter Vector Databases, a game-changer in handling semantic search. Unlike traditional text matching, which relies on exact words or phrases, vector databases like Postgres with pgvector process information semantically. This database is a cornerstone of the system’s ability to match user queries with the most relevant information from the scraped content, enabling fast and accurate responses.

5. User Ask Question

The system provides an API through which users can submit their questions. In this use case, users can ask any question related to the content of the Open5GS documentation. This API serves as the primary interface for interactions between the user and the chatbot. The API takes a parameter, user_id, which is used to identify different user sessions. This user_id is used for demonstration purposes. In real-world scenarios, it could be managed with an Authorization header (e.g., JWT Bearer token) in the HTTP request. The API is designed to be intuitive and accessible, enabling users to easily input their queries and receive responses.

6. Create Vector Embedding of Question

When a user submits a question through the API, the system converts this question into a vector embedding. The generation of the embedding is automatically handled by the ConversationalRetrievalChain. This facilitates the semantic search of documents related to the question within the vector database.

7. Semantic Search Vector Database

Once the vector embedding for the question is created, the system employs semantic search to scan through the vector database, identifying content most relevant to the user’s query. By comparing the vector embedding of the question with those of the stored data, the system can accurately pinpoint information that is contextually similar or related to the query. In this scenario, I have utilized the ConversationalRetrievalChain, which automatically handles semantic searches based on the input query. The results of the semantic search are then identified as context for the LLM.

8. Generate Prompt

Next, the ConversationalRetrievalChain generates a custom prompt with the user’s question and the semantic search result (context). A prompt for a language model is a set of instructions or input provided by the user to guide the model’s response. This helps the model understand the context and generate relevant and coherent language-based outputs, such as answering questions, completing sentences, or engaging in a conversation.

9. Post Prompt to LLM

After generating the prompt, it is posted to the LLM (in our case, the GPT4All nous-hermes-llama2–13b.Q4_0.gguf) through Langchain libraries GPT4All(Langchain officially supports the GPT4All with in langchain.llms). The LLM then finds the answer to the question based on the provided context. The ConversationalRetrievalChain handles this function of posting the query to the LLM (behind the scenes, it uses OpenAI APIs to submit the question).

10. LLM Generate Answer

The LLM, utilizing the advanced capabilities of Meta’s Llama-2, processes the question within the context of the provided content. It then generates a response and sends it back.

11. Save Query and Response in MongoDB Chat History

Langchain provides a variety of components for managing conversational memory. In this chatbot, MongoDB has been employed for the management of conversational memory. At this stage, both the user’s question and the chatbot’s response are recorded in MongoDB storage as part of the chat history. This approach ensures that all user chat histories are persistently stored in MongoDB, thus enabling the retrieval of previous interactions. The data is stored in MongoDB on a per-user-session basis. To distinguish between user sessions, the API utilizes the user_id parameter, as previously mentioned. This historical data is pivotal in shaping future interactions. When the same user poses subsequent questions, the chat history, along with the new semantic search results (context), is relayed to the LLM. This process guarantees that the chatbot can maintain context throughout a conversation, resulting in more precise and tailored responses.

12. Send Answer Back to User

Finally, the answer received from the LLM is forwarded to the user via the HTTP API. Users can continue to ask different questions in subsequent requests by providing the same user_id. The system then recognizes the user’s chat history and includes it in the information sent to the LLM, along with the new semantic search results. This process ensures a seamless and contextually aware conversation, enriching the user experience with each interaction.

Implementation

The complete implementation of this ChatBot is detailed below. The full source code of the ChatBot agent is available for access and review on GitLab.

1. Configurations

In the config.py file, I have defined various configurations used in the ChatBot. These configurations are read through environment variables in adherence to the principles of 12-factor apps.

import os

# define init index
INIT_INDEX = os.getenv('INIT_INDEX', 'false').lower() == 'true'

# vector index persist directory
INDEX_PERSIST_DIRECTORY = os.getenv('INDEX_PERSIST_DIRECTORY', "./data/chromadb")

# target url to scrape
TARGET_URL = os.getenv('TARGET_URL', "https://open5gs.org/open5gs/docs/")

# http api port
HTTP_PORT = os.getenv('HTTP_PORT', 7654)

# mongodb config host, username, password
MONGO_HOST = os.getenv('MONGO_HOST', 'localhost')
MONGO_PORT = os.getenv('MONGO_PORT', 27017)
MONGO_USER = os.getenv('MONGO_USER', 'testuser')
MONGO_PASS = os.getenv('MONGO_PASS', 'testpass')

2. HTTP API

The HTTP API implementation is carried out in api.py. This API includes an HTTP POST endpoint api/question, which accepts a JSON object containing a question and user_id. The user_id is utilized for demonstration purposes. In a real-world application, this could be managed with an Authorization header (e.g., JWT Bearer token) in the HTTP request. When a question request is received from the user, it is forwarded to the chat function in the ChatBot model.

from flask import Flask
from flask import jsonify
from flask import request
from flask_cors import CORS
import logging
import sys
from model import init_index
from model import init_conversation
from model import chat
from config import *

app = Flask(__name__)
CORS(app)

logging.basicConfig(stream=sys.stdout, level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

@app.route('/api/question', methods=['POST'])
def post_question():
json = request.get_json(silent=True)
question = json['question']
user_id = json['user_id']
logging.info("post question `%s` for user `%s`", question, user_id)

resp = chat(question, user_id)
data = {'answer':resp}

return jsonify(data), 200

if __name__ == '__main__':
init_index()
init_conversation()
app.run(host='0.0.0.0', port=HTTP_PORT, debug=True)

3. Model

Below is the implementation of the Model. It includes a function, init_index, which scrapes data from a given web URL and creates the vector store. An environment variable, INIT_INDEX, is used to determine whether to create the index. The init_conversation function initializes the ConversationalRetrievalChain, with GPT4All’s nous-hermes-llama2–13b.Q4_0.gguf LLM. The chat function is responsible for posting questions to the LLM.

from langchain.llms import GPT4All
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import PyPDFLoader
from langchain.document_loaders import DirectoryLoader
from langchain.document_loaders.recursive_url_loader import RecursiveUrlLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from langchain.llms import OpenAI
from langchain.embeddings import HuggingFaceEmbeddings
from bs4 import BeautifulSoup as Soup
from langchain.utils.html import (PREFIXES_TO_IGNORE_REGEX,
SUFFIXES_TO_IGNORE_REGEX)

from config import *
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')


global conversation
conversation = None


def init_index():
if not INIT_INDEX:
logging.info("continue without initializing index")
return

# scrape data from web
documents = RecursiveUrlLoader(
TARGET_URL,
max_depth=4,
extractor=lambda x: Soup(x, "html.parser").text,
prevent_outside=True,
use_async=True,
timeout=600,
check_response_status=True,
# drop trailing / to avoid duplicate pages.
link_regex=(
f"href=[\"']{PREFIXES_TO_IGNORE_REGEX}((?:{SUFFIXES_TO_IGNORE_REGEX}.)*?)"
r"(?:[\#'\"]|\/[\#'\"])"
),
).load()

logging.info("index creating with `%d` documents", len(documents))

# split text
# this chunk_size and chunk_overlap effects to the prompt size
# execeed promt size causes error `prompt size exceeds the context window size and cannot be processed`
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
documents = text_splitter.split_documents(documents)

# create embeddings with huggingface embedding model `all-MiniLM-L6-v2`
# then persist the vector index on vector db
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectordb = Chroma.from_documents(
documents=documents,
embedding=embeddings,
persist_directory=INDEX_PERSIST_DIRECTORY
)
vectordb.persist()


def init_conversation():
global conversation

# load index
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectordb = Chroma(persist_directory=INDEX_PERSIST_DIRECTORY,embedding_function=embeddings)

# create conversation
llm = GPT4All(
model="nous-hermes-llama2-13b.Q4_0.gguf",
verbose=True,
)
conversation = ConversationalRetrievalChain.from_llm(
llm,
retriever=vectordb.as_retriever(),
return_source_documents=True,
verbose=True,
)


def chat(question, user_id):
global conversation

chat_history = []
response = conversation({"question": question, "chat_history": chat_history})
answer = response['answer']

logging.info("got response from llm - %s", answer)

# TODO save history

return answer

Run Application

Below are the main steps to operate the ChatBot application and interact with it. Questions can be submitted using the HTTP API, and responses will be received accordingly.

1. Install Dependencies

In this application, I have utilized a number of Python packages that need to be installed using Python’s pip package manager before running the application. The requirements.txt file lists all the necessary packages.

gpt4all

huggingface-hub
sentence-transformers

Flask==2.0.1
Werkzeug==2.2.2
flask-cors

langchain==0.0.352
chromadb==0.3.29
tiktoken
unstructured
unstructured[local-pdf]
unstructured[local-inference]

I have used python virtual environment to setup these dependencies. These packages can be easily installed by executing the command pip install -r requirements.txt.

# create virtual environment in `gpt4all` source directory
❯❯ cd gpt4all
❯❯ python -m venv .venv

# enable virtual environment
❯❯ source .venv/bin/activate

# install dependencies
❯❯ pip install -r requirements.txt

2. Run RAG Application

The RAG application can be initiated through api.py as outlined below. Prior to running it, it's necessary to set a few configurations via environment variables. Once app.py is executed, it will start the HTTP API, enabling users to post their questions.

# enable virtual environment in `gpt4all` source directory 
❯❯ cd gpt4all
❯❯ source .venv/bin/activate

# set env variabl INIT_INDEX which determines weather needs to create the index
❯❯ export INIT_INDEX=true

# run aplication
❯❯ python api.py
2024-03-10 16:54:54,147 - INFO - continue without initializing index
2024-03-10 16:54:55,052 - INFO - Load pretrained SentenceTransformer: all-MiniLM-L6-v2
2024-03-10 16:54:57,969 - INFO - Use pytorch device_name: mps
2024-03-10 16:54:58,268 - INFO - Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information.
2024-03-10 16:54:58,364 - INFO - loaded in 1134 embeddings
2024-03-10 16:54:58,364 - INFO - loaded in 1 collections
2024-03-10 16:54:58,365 - INFO - collection with name langchain already exists, returning existing collection
* Serving Flask app 'api' (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: on
2024-03-10 16:54:58,799 - INFO - WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:7654
* Running on http://192.168.0.110:7654
2024-03-10 16:54:58,799 - INFO - Press CTRL+C to quit
2024-03-10 16:54:58,799 - INFO - * Restarting with stat
2024-03-10 16:54:59,569 - INFO - continue without initializing index
2024-03-10 16:55:00,481 - INFO - Load pretrained SentenceTransformer: all-MiniLM-L6-v2
2024-03-10 16:55:02,268 - INFO - Use pytorch device_name: mps
2024-03-10 16:55:02,560 - INFO - Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information.
2024-03-10 16:55:02,653 - INFO - loaded in 1134 embeddings
2024-03-10 16:55:02,653 - INFO - loaded in 1 collections
2024-03-10 16:55:02,654 - INFO - collection with name langchain already exists, returning existing collection
2024-03-10 16:55:03,112 - WARNING - * Debugger is active!
2024-03-10 16:55:03,166 - INFO - * Debugger PIN: 868-181-916

3. Post Question

Once the RAG application is running, I can submit questions related to the Open5GS documentation via the HTTP API.

# post question
❯❯ curl -i -XPOST "http://localhost:7654/api/question" \
--header "Content-Type: application/json" \
--data '
{
"question": "How open5gs work?",
"user_id": "zio"
}
'


# ConversationalRetrievalChain generate following prompt with question, semantic seach result and send to llm
> Entering new LLMChain chain...
Prompt after formatting:
Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Open5GS Sukchan Lee acetcom@gmail.com GitHub open5gs Open5GS is a C-language implementation of 5G Core and EPC, i.e. the core network of NR/LTE network (Release-17)

Open5GS Sukchan Lee acetcom@gmail.com GitHub open5gs Open5GS is a C-language implementation of 5G Core and EPC, i.e. the core network of NR/LTE network (Release-17)

Quickstart | Open5GS Open5GS Documentation Features Support CLA OSS Notice GitHub Quickstart 2024-03-08 21:04 1. Introduction to Open5GS Welcome! If you want to set up your first Open5GS core you have come to the right place. Before we get started, we’ll spend a moment to understand the basic architecture of the software. TL;DR: Open5GS contains a series of software components and network functions that implement the 4G/ 5G NSA and 5G SA core functions. If you know what each of these do already and how they interface with each other, skip to section 2. [Higher quality PDF diagram available HERE] 4G/ 5G NSA Core The Open5GS 4G/ 5G NSA Core contains the following components: MME - Mobility Management Entity HSS - Home Subscriber Server PCRF - Policy and Charging Rules Function SGWC - Serving Gateway Control Plane SGWU - Serving Gateway User Plane PGWC/SMF - Packet Gateway Control Plane / (component contained in Open5GS SMF) PGWU/UPF

Question: How open5gs work?
Helpful Answer:
2024-03-10 16:56:18,623 - INFO - LLModel.prompt_model -- prompt:
Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Open5GS Sukchan Lee acetcom@gmail.com GitHub open5gs Open5GS is a C-language implementation of 5G Core and EPC, i.e. the core network of NR/LTE network (Release-17)

Open5GS Sukchan Lee acetcom@gmail.com GitHub open5gs Open5GS is a C-language implementation of 5G Core and EPC, i.e. the core network of NR/LTE network (Release-17)

Quickstart | Open5GS Open5GS Documentation Features Support CLA OSS Notice GitHub Quickstart 2024-03-08 21:04 1. Introduction to Open5GS Welcome! If you want to set up your first Open5GS core you have come to the right place. Before we get started, we’ll spend a moment to understand the basic architecture of the software. TL;DR: Open5GS contains a series of software components and network functions that implement the 4G/ 5G NSA and 5G SA core functions. If you know what each of these do already and how they interface with each other, skip to section 2. [Higher quality PDF diagram available HERE] 4G/ 5G NSA Core The Open5GS 4G/ 5G NSA Core contains the following components: MME - Mobility Management Entity HSS - Home Subscriber Server PCRF - Policy and Charging Rules Function SGWC - Serving Gateway Control Plane SGWU - Serving Gateway User Plane PGWC/SMF - Packet Gateway Control Plane / (component contained in Open5GS SMF) PGWU/UPF

Question: How open5gs work?
Helpful Answer:
===/LLModel.prompt_model -- prompt/===


# HTTP response status
HTTP/1.1 200 OK
Server: Werkzeug/2.2.2 Python/3.11.2
Date: Sun, 10 Mar 2024 20:56:33 GMT
Content-Type: application/json
Content-Length: 838
Access-Control-Allow-Origin: *
Connection: close


# response
{
"answer": " Open5GS is a C-language implementation of 5G Core and EPC, i.e. the core network of NR/LTE network (Release-17). It contains several components such as MME, HSS, PCRF, SGWC, PGWC/SMF, UPF, etc., which work together to provide core network functions for 4G and 5G networks. The architecture of Open5GS is designed to be modular and scalable, allowing it to support a wide range of use cases and deployment scenarios. It also includes features such as dynamic S1AP bearer setup, traffic steering between different access technologies (e.g., LTE/NR), and integration with external systems like OSS/BSS platforms. Overall, Open5GS provides an open-source alternative to proprietary 4G and 5G core network solutions, enabling operators and vendors to customize and optimize their networks according to their specific needs."
}



---



# ask next question
❯❯ curl -i -XPOST "http://localhost:7654/api/question" \
--header "Content-Type: application/json" \
--data '
{
"question": "what are the enodebs and gnodebs tested in open5gs",
"user_id": "bassa"
}
'


# ConversationalRetrievalChain generate following prompt with question, semantic seach result and send to llm
> Entering new LLMChain chain...
Prompt after formatting:
Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

eNodeBs / gNodeBs tested on Open5GS | Open5GS Open5GS Documentation Features Support CLA OSS Notice GitHub eNodeBs / gNodeBs tested on Open5GS 2024-03-08 21:04 This page lists Radio hardware that has been tested by members of the Open5GS community, If you have tested radio hardware from a vendor not listed with Open5GS, please add it to this page by creating a PR on GitHub. Commercial 5G Airfill S5G AFBU-SL14CN (DU + CU) + AFRU-352-I Indoor Radio (n77 and n78) Airspan 5G OpenRange vCU + Airspan 5G OpenRange vDU + Airspan 5G OpenRANGE06 AirVelocity 2700 RU Airspan AirSpeed 2900 Airspan AirStrand 2200 ASKEY SCE2200 5G SUB-6 SMALL CELL BTI Wireless nCELL-F2240 5G NR Femtocell (n78) CableFree Small Cell Outdoor radios (5G n77, n78 and other bands) CableFree Small Cell Indoor radios (5G n77, n78 and other bands) CableFree Macro (BBU+RRH) radios (4G and 5G, various bands) Ericsson Baseband 6630 (21.Q3 Software) + Radio 2217, Radio 2219

eNodeBs / gNodeBs tested on Open5GS | Open5GS Open5GS Documentation Features Support CLA OSS Notice GitHub eNodeBs / gNodeBs tested on Open5GS 2024-03-08 21:04 This page lists Radio hardware that has been tested by members of the Open5GS community, If you have tested radio hardware from a vendor not listed with Open5GS, please add it to this page by creating a PR on GitHub. Commercial 5G Airfill S5G AFBU-SL14CN (DU + CU) + AFRU-352-I Indoor Radio (n77 and n78) Airspan 5G OpenRange vCU + Airspan 5G OpenRange vDU + Airspan 5G OpenRANGE06 AirVelocity 2700 RU Airspan AirSpeed 2900 Airspan AirStrand 2200 ASKEY SCE2200 5G SUB-6 SMALL CELL BTI Wireless nCELL-F2240 5G NR Femtocell (n78) CableFree Small Cell Outdoor radios (5G n77, n78 and other bands) CableFree Small Cell Indoor radios (5G n77, n78 and other bands) CableFree Macro (BBU+RRH) radios (4G and 5G, various bands) Ericsson Baseband 6630 (21.Q3 Software) + Radio 2217, Radio 2219

Question: what are the enodebs and gnodebs tested in open5gs
Helpful Answer:
2024-03-10 17:00:02,043 - INFO - LLModel.prompt_model -- prompt:
Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

eNodeBs / gNodeBs tested on Open5GS | Open5GS Open5GS Documentation Features Support CLA OSS Notice GitHub eNodeBs / gNodeBs tested on Open5GS 2024-03-08 21:04 This page lists Radio hardware that has been tested by members of the Open5GS community, If you have tested radio hardware from a vendor not listed with Open5GS, please add it to this page by creating a PR on GitHub. Commercial 5G Airfill S5G AFBU-SL14CN (DU + CU) + AFRU-352-I Indoor Radio (n77 and n78) Airspan 5G OpenRange vCU + Airspan 5G OpenRange vDU + Airspan 5G OpenRANGE06 AirVelocity 2700 RU Airspan AirSpeed 2900 Airspan AirStrand 2200 ASKEY SCE2200 5G SUB-6 SMALL CELL BTI Wireless nCELL-F2240 5G NR Femtocell (n78) CableFree Small Cell Outdoor radios (5G n77, n78 and other bands) CableFree Small Cell Indoor radios (5G n77, n78 and other bands) CableFree Macro (BBU+RRH) radios (4G and 5G, various bands) Ericsson Baseband 6630 (21.Q3 Software) + Radio 2217, Radio 2219

eNodeBs / gNodeBs tested on Open5GS | Open5GS Open5GS Documentation Features Support CLA OSS Notice GitHub eNodeBs / gNodeBs tested on Open5GS 2024-03-08 21:04 This page lists Radio hardware that has been tested by members of the Open5GS community, If you have tested radio hardware from a vendor not listed with Open5GS, please add it to this page by creating a PR on GitHub. Commercial 5G Airfill S5G AFBU-SL14CN (DU + CU) + AFRU-352-I Indoor Radio (n77 and n78) Airspan 5G OpenRange vCU + Airspan 5G OpenRange vDU + Airspan 5G OpenRANGE06 AirVelocity 2700 RU Airspan AirSpeed 2900 Airspan AirStrand 2200 ASKEY SCE2200 5G SUB-6 SMALL CELL BTI Wireless nCELL-F2240 5G NR Femtocell (n78) CableFree Small Cell Outdoor radios (5G n77, n78 and other bands) CableFree Small Cell Indoor radios (5G n77, n78 and other bands) CableFree Macro (BBU+RRH) radios (4G and 5G, various bands) Ericsson Baseband 6630 (21.Q3 Software) + Radio 2217, Radio 2219

Question: what are the enodebs and gnodebs tested in open5gs
Helpful Answer:
===/LLModel.prompt_model -- prompt/===


# HTTP response
HTTP/1.1 200 OK
Server: Werkzeug/2.2.2 Python/3.11.2
Date: Sun, 10 Mar 2024 21:00:25 GMT
Content-Type: application/json
Content-Length: 725
Access-Control-Allow-Origin: *
Connection: close


# response
{
"answer": " eNodeBs / gNodeBs tested on Open5GS | Open5GS Open5GS Documentation Features Support CLA OSS Notice GitHub eNodeBs / gNodeBs tested on Open5GS 2024-03-08 21:04 This page lists Radio hardware that has been tested by members of the Open5GS community, If you have tested radio hardware from a vendor not listed with Open5GS, please add it to this page by creating a PR on GitHub. Commercial 5G Airfill S5G AFBU-SL14CN (DU + CU) + AFRU-352-I Indoor Radio (n77 and n78) Airspan 5G OpenRange vCU + Airspan 5G OpenRange vDU + Airspan 5G OpenRANGE06 AirVelocity 2700 RU Airspan AirSpeed 2900 Airspan AirStrand 2200 ASKEY SCE2200 5G SUB-6 SMALL CELL BTI Wireless nCELL-F2240"
}

What’s Next

In next post, I have discussed building the same RAG application using a different tool called Ollama which is a lightweight and flexible framework designed for the local deployment of LLM on personal computers.
Build RAG Application Using a LLM Running on Local Computer with Ollama and Langchain.

Reference

  1. https://medium.com/@vikastiwari708409/how-to-use-gpt4all-llms-with-langchain-to-work-with-pdf-files-f0f0becadcb6
  2. https://cismography.medium.com/how-to-integrate-custom-llm-using-langchain-a-gpt4all-example-cfcb6d26fc3
  3. https://www.kdnuggets.com/2023/06/gpt4all-local-chatgpt-documents-free.html
  4. https://docs.gpt4all.io/gpt4all_python.html
  5. https://medium.com/@neviogomez91/analyze-your-code-with-a-local-gpt-model-using-langchain-chroma-dd869d3fcdfa
  6. https://bakshiharsh55.medium.com/text-embedding-models-in-langchain-887f1873c7ac
  7. https://towardsdatascience.com/retrieval-augmented-generation-rag-inference-engines-with-langchain-on-cpus-d5d55f398502
  8. https://addepto.com/blog/rag-vs-fine-tuning-a-comparative-analysis-of-llm-learning-techniques/
  9. https://towardsdatascience.com/quantize-llama-models-with-ggml-and-llama-cpp-3612dfbcc172
  10. https://medium.com/vendi-ai/efficiently-run-your-fine-tuned-llm-locally-using-llama-cpp-66e2a7c51300
  11. https://www.linkedin.com/pulse/what-llamacpp-how-does-compare-chatgpt-navicstein-chinemerem
  12. https://www.entrypointai.com/blog/fine-tune-llama-2/
  13. https://dassum.medium.com/fine-tune-large-language-model-llm-on-a-custom-dataset-with-qlora-fb60abdeba07
  14. https://mohitdsoni.medium.com/training-chat-gpt-on-your-data-efaa7b7f521b

--

--