ASIMOV: Leveraging RAG Models for Enhanced Efficiency in the Telecommunications Engineering Domain

11 min readMay 21, 2024

Telecommunications engineers and technicians routinely grapple with highly specific queries in their daily operations. These queries often necessitate specialized knowledge drawn from a multitude of technical documentation libraries. Traditional Large Language Models (LLMs) struggle to meet these demands due to the requirement for up-to-date and highly specific responses.

This article provides an in-depth look at the creation of ASIMOV, a solution powered by Retrieval-Augmented Generation (RAG). ASIMOV was initially deployed to offer precise, domain-specific assistance to the technology staff at Dialog Axiata PLC, the leading mobile telecommunications service provider in Sri Lanka.

Why RAG?

LLMs are trained on massive amounts of text data, allowing them to generate text, translate languages, and answer questions. However, their knowledge is limited to what they’ve been trained on, which can be outdated or lack specific domain expertise. RAG tackles this challenge. RAG combines an information retrieval component with a text generation model. The retrieval component finds relevant information from an external source, such as a knowledge base, in our case specific to the telecommunication domain. The text generation model then uses this retrieved information to create a more comprehensive and accurate response. This approach ensures that RAG responses are grounded in information and can leverage constantly updated knowledge bases without needing to retrain the entire LLM, making it a cost-effective solution for improving LLM outputs [1].

By integrating a dynamic knowledge base tailored to the telecommunications industry, Retrieval-Augmented Generation (RAG) models can equip telecommunications staff in companies like ours with the most recent information on troubleshooting methods, equipment specifications, and industry-wide best practices instantly. This not only minimizes downtime and streamlines the configurations and troubleshooting processes for more efficient engineering operations, but also serves as an invaluable training resource for budding engineers. This is what we hope to achieve with ASIMOV.

Next, let’s go through the multiple components of ASIMOV elaborating on decisions made at each step.

Vector Storage

A vector storage was required to store domain-specific knowledge to be retrieved by RAG. In short, a vector database stores data represented in the form of high-dimensional vectors. Three such vector database options were considered:

Based on our research, Milvus was selected as the vector database of choice. Milvus is an open-source vector database designed specifically for similarity search on massive datasets of high-dimensional vectors [2].

Here’s why we made the above decision.

Why was Milvus selected over Chroma DB?

High performance when conducting vector searches on massive datasets.
A developer-first community that offers multi-language support and toolchain.
Cloud scalability and high reliability even in the event of a disruption.
Hybrid search was achieved by pairing scalar filtering with vector similarity search.

Why was Milvus selected over Pinecone?

Pinecone operates as a cloud-based service. Due to the proprietary nature of the information stored, we opted to go for a local storage option.

Therefore, Milvus was selected for its superior performance, developer-friendly features, and on-premises storage.

from langchain_community.vectorstores import Milvus

vector_db_filter = "vendor in ['" + "', '".join(filter_vendors) + "']"

vector_db = Milvus(
    embedding_model,
    connection_args={
        "host": "milvus_host",
        "port": "milvus_host_port",
        "user": "username",
        "password": "password",
    },
    collection_name="telco_data_collection",
    vector_field="embedding",
    text_field="document",
)

After configuring Milvus we had to extract content from the raw documents (PDF, HTML, Excel). For this purpose, we used document loaders from the agent framework.

Agent Framework

What is an agent?

The agent is a key part of the RAG system. It acts as an orchestrator, specifically designed to handle multi-step information retrieval and response generation. Here we have used the Langchain framework as the agent.

What is LangChain?

LangChain is the chosen agent for this project. It’s an open-source framework designed to simplify building applications powered by LLMs. LangChain acts as a central hub, allowing for the seamless integration of various components and the construction of complex workflows for RAG models. In our case, LangChain facilitated key aspects of the RAG creation process, including data extraction from different document formats, text processing, and data storage in vector databases with efficient retrieval through embeddings [3].

Document Loaders

To extract information from our knowledge base to be stored in our vector database we utilized LangChain’s document loading capabilities from the langchain_community.document_loaders class. This class provides a suite of tools specifically designed to handle various document formats.

· For HTML files, we employed UnstructuredHTMLLoader. This loader efficiently parses the HTML structure and extracts the relevant text content.

from langchain_community.document_loaders import UnstructuredHTMLLoader

html_loader = UnstructuredHTMLLoader(html_file_path)

· For PDF documents, we leveraged the PyPDFLoader. This loader extracts text from the PDF files.

from langchain_community.document_loaders import PyPDFLoader

pdf_loader = PyPDFLoader(pdf_file_path)

· Table data of PDF files were captured using the tabula-py library.

import tabula

tables = tabula.read_pdf(file_path, pages=page_number + 1, multiple_tables=True)

· Finally, the UnstructuredExcelLoader handles the extraction of data from Excel spreadsheets, allowing us to incorporate structured information.

from langchain_community.document_loaders import UnstructuredExcelLoader

excel_loader = UnstructuredExcelLoader(excel_file_path)

Chunking

The subsequent step involved segmenting the text content into chunks. For this task, we employed the CharacterTextSplitter function in LangChain.

There are several advantages to choosing a Character Splitter, namely its cost-effectiveness due to not requiring an ML model, its versatility with handling various separators, and its language flexibility which, in our case is important due to the large amount of technical terminology and vendor-specific terms in our knowledge base [4].

from langchain.text_splitter import CharacterTextSplitter

text_splitter = CharacterTextSplitter(
    separator="\n",
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap,
    length_function=len,
)

We opted for a chunk_size of 1024 with a chunk_overlap of 256. The rationale behind choosing a chunk size of 1024 was to balance accuracy and cost. A larger chunk size might complicate the accurate identification of relevant documents. Conversely, a smaller chunk size could potentially increase the number of relevant documents, escalating the overall token cost.

Embedding

The next step was to embed the chunks as vector representations. Two embedding models were considered.

· The first model, all-MiniLM-L6-v2, excels at capturing the semantic meaning of sentences and paragraphs. It efficiently converts these text units into 384-dimensional vectors, essentially encoding their meaning in a numerical space. This allows the RAG system to compare the query’s vector with those of candidate passages and identify the ones with the closest semantic representation.

from langchain.embeddings import HuggingFaceEmbeddings

embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

· The second model, BAAI/bge-large-en-v1.5, specifically addresses the challenge of similarity distribution in retrieval tasks. It’s designed to enhance the system’s ability to retrieve the most relevant passages, even when the semantic relationships between the query and passages might be subtle.

from langchain.embeddings import HuggingFaceEmbeddings

embedding_model = HuggingFaceEmbeddings(model_name="BAAI/bge-large-en-v1.5")

The prompt responses obtained when using both the above models were similar. However, all-MiniLM-L6-v2 showcased better performance taking less time to embed. Hence, we selected it as our embedding model.

from pymilvus import connections, Collection

connections.connect(host="localhost", port="19530", token="username:password")
vector_db_collection = Collection(collection_name)

loader = PyPDFLoader(file_path)
pages = loader.load()

text_chunks = text_splitter.split_text(pages)

data_rows = []
for chunk_id, text_chunk in enumerate(text_chunks):
    page_no = text_chunk.metadata["page"]
    
 embeddings = embedding_model.encode(text_chunk)
 vendor = "Huawei"
 document_type = "PDF"

 data_row = {
  "embedding": embeddings,
  "document": text_chunk.page_content,
  "file_name": file_name,
  "page_no": page_no,
  "chunk_no": chunk_id + 1,
  "vendor": vendor,
  "document_type": document_type,
 }
 data_rows.append(data_row)

vector_db_collection.insert(data=data_rows)

LLM and Information Security Considerations

While in the initial testing phase, we used OpenAI’s GPT-4, we had to reconsider considering the sensitive nature of the information on our knowledge base. To cater to our information security needs we opted to use GPT-4 provided through Microsoft’s Azure OpenAI Service. Microsoft makes the following assurances [5]:

Your prompts (inputs) and completions (outputs), your embeddings, and your training data:
- are NOT available to other customers.
- are NOT available to OpenAI.
- are NOT used to improve OpenAI models.
- are NOT used to improve any Microsoft or 3rd party products or services.
- are NOT used for automatically improving Azure OpenAI models for your use in your resource (The models are stateless, unless you explicitly fine-tune models with your training data).
- Your fine-tuned Azure OpenAI models are available exclusively for your use.
The Azure OpenAI Service is fully controlled by Microsoft; Microsoft hosts the OpenAI models in Microsoft’s Azure environment and the Service does NOT interact with any services operated by OpenAI (e.g. ChatGPT, or the OpenAI API).

from langchain_openai import AzureChatOpenAI

llm = AzureChatOpenAI(
    api_key="azure_access_key",
    azure_endpoint="azure_deployment_endpoint",
    openai_api_version="2023-05-15",
    azure_deployment="azure_deployment_name",
    temperature=0,
    model_name="gpt-4",
    streaming=False,
)

Workflow

We can now look into how all of the above components work in tandem to provide ASIMOV with its functionality.

The user provides a question to the RAG system. This question is then passed to the agent. LangChain provides a class called ConversationalRetrievalChain, which establishes a connection between the agent and the vector database. It facilitates the retriever to conduct a similarity search with the question and each chunk, retrieving the relevant chunks from the vector database.

from langchain.chains import ConversationalRetrievalChain

conversation_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=vector_db.as_retriever(
        search_kwargs={
            "expr": vector_db_filter,
            "k": 10,
        },
    ),
    memory=memory_buffer,
    return_source_documents=True,
    combine_docs_chain_kwargs={"prompt": prompt_template},
    verbose=True,
)

To optimize our RAG system’s similarity search functionality, we have implemented a strategic enhancement by reorganizing the metadata. This enhancement involves the separation of metadata attributes into distinct columns. Previously, these attributes were combined into a single JSON string that is difficult to search (requires the use of regular expressions). By separating them into distinct columns, we can filter searches within the RAG system. This allows users to narrow down their prompting based on specific metadata criteria (e.g. searching from a specific document or a set of documents specific to a particular vendor).

*Metadata separated into distinct columns (displayed on Milvus attu interface)*

Subsequently, the agent passes the question, relevant chunks, chat history, and prompt template to the selected LLM. Let’s take a look at Chat History and Prompt Template components.

Chat History

LangChain offers a class named ConversationBufferMemory, which assists in establishing a memory buffer named “chat_history” for storing conversation data. This buffer employs a key-value structure, where questions are stored with the key “question” and corresponding answers with the key “answer”. Such a design enhances context-aware responses and dialogue generation by preserving a record of the conversation history. When accessing the buffer, the parameter return_messages=True is set. This ensures that individual messages (questions and answers) are returned, facilitating the creation and analysis of conversation history.

from langchain.memory import ConversationBufferMemory

memory_buffer = ConversationBufferMemory(
    memory_key="chat_history",
    input_key="question",
    output_key="answer",
    return_messages=True,
)

Prompt Template

The prompt template we utilized is as follows:

from langchain.prompts import PromptTemplate

prompt_template_text = """You are a technical assistant tasked with finding the user-requested details from provided document chunks. Return in JSON format with the key "answer". If you cannot find an exact answer you can return 'None' as the value in "answer" key.
Request from user: {question}
Chat history: {chat_history}
Provided document chunks: {context}"""

prompt_template = PromptTemplate(
    input_variables=["question", "chat_history", "context"],
    template=prompt_template_text,
)

The prompt template guides the LLM’s response. It responds in JSON format to suit our requirements.

Response

After providing the question, relevant chunks, prompt template, and chat history to the LLM using the conversation_chain, it generates a summary for the agent. The agent then delivers the answer to the user. A response is called as follows.

response = conversation_chain({"question": user_prompt})

In comparison to a conventional LLM like ChatGPT, using RAG we can get extremely specific answers to our prompts. For example, for the following question,

What are the possible parameter changes that can be performed to improve QCI level latency?

ChatGPT (GPT-4) provides a more generic answer as follows:

1. Paging Group Size: The size of the paging group can be optimized to reduce congestion. A smaller paging group size can reduce the number of pages sent at once, thus reducing the chance of congestion.2. Paging Cycle: The paging cycle can be adjusted to reduce congestion. A shorter paging cycle means that pages are sent more frequently, which can reduce the chance of congestion.3. Paging Retransmission: The number of retransmissions can be optimized to reduce congestion. If a page is not acknowledged, it can be retransmitted a certain number of times before it is considered lost. Reducing the number of retransmissions can reduce congestion.4. Paging Capacity: The capacity of the paging channel can be increased to reduce congestion. This can be done by adding more paging channels or by increasing the bandwidth of the existing channels.5. Paging Load Balancing: Load balancing can be used to distribute the paging load evenly across all available channels. This can reduce congestion by ensuring that no single channel is overloaded.6. Paging Priority: The priority of different types of pages can be adjusted to reduce congestion. For example, high-priority pages can be sent first to ensure that they are not delayed by congestion.7. Paging Scheduling: The scheduling of pages can be optimized to reduce congestion. This can involve sending pages at off-peak times or scheduling pages in a way that minimizes the chance of congestion.8. Paging Buffer Size: The size of the paging buffer can be optimized to reduce congestion. A larger buffer can store more pages, reducing the chance of congestion.9. Paging Algorithm: The algorithm used for paging can be optimized to reduce congestion. This can involve using a more efficient algorithm or tweaking the parameters of the existing algorithm.

Whereas the ASIMOV, using RAG, provides more specific answers that can be used by a telco engineer/technician in day-to-day tasks (vendor-specific information redacted):

- Setting the parameter to ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ or ■■■■■■■■■■■■■■■■■■■■■■■■ to schedule paging messages based on coverage levels to maximize the paging capacity and reduce paging loss.- Adding cells or splitting existing cells.- Optimizing RF performance to reduce the interference to ■■■■■■ from neighboring cells.- Setting the parameter to ■■■■■■■■■■■■■■■■■■■■ to preferentially transmit paging messages that arrive earlier than others on a paging occasion.- Selecting the ■■■■■■■■■■■■■■■■■■■■■■ option to increase the paging success rate.- Configuring the selection probability for each ■■■■■■ resource on the anchor carrier.- Setting the minimum serving cell ■■■■■■ threshold applicable to the coverage-based paging carrier group.- Indicating which DL carriers a UE supporting mixed operation mode monitors for paging.- Setting the weight of the non-anchor paging carrier for uneven paging load distribution across the carrier

Frontend

The user interface for ASIMOV was designed using Streamlit, an open-source app framework that turns Python scripts into deployable web apps.

*ASIMOV user interface designed using Streamlit*

Future Directions

Looking ahead, we envision a fusion of our existing Retrieval-Augmented Generation (RAG) model with a fine-tuned LLM. The refined LLM will be trained on prompts and responses generated by our current model. Crucially, this training process will be guided and evaluated using a user feedback system. This innovative approach aims to enhance the model’s performance by leveraging real-world user interactions and feedback. This could potentially lead to more accurate, context-aware, and user-centric responses.

Furthermore, while ASIMOV currently excels at interpreting text and tables, we hope to expand its functionality to include the interpretation of diagrams and figures. This advancement will allow the application to process and understand a wider range of data formats, increasing its utility.

References

[1] “What is RAG,” Amazon Web Services, [Online]. Available: https://aws.amazon.com/what-is/retrieval-augmented-generation/.

[2] “What is Milvus,” Milvus, 26 March 2024. [Online]. Available: https://milvus.io/docs/overview.md.

[3] “Introduction,” LangChain, 2024. [Online]. Available: https://python.langchain.com/docs/get_started/introduction/.

[4] P. Bhavsar, “Mastering RAG: Advanced Chunking Techniques for LLM Applications,” Galileo Labs, 23 February 2024. [Online]. Available: https://www.rungalileo.io/blog/mastering-rag-advanced-chunking-techniques-for-llm-applications. [Accessed 16 April 2024].

[5] “Data, privacy, and security for Azure OpenAI Service,” Microsoft, 2 February 2023. [Online]. Available: https://learn.microsoft.com/en-us/legal/cognitive-services/openai/data-privacy.

Project Team

The following members of Group Technology Analytics and Automation, Dialog Axiata PLC constituted the project team for ASIMOV:

Nethmi Nagodawithane, Intern — Group Technology (University of Colombo).

Aruni Gunasekara, Intern — Group Technology (Sri Lanka Institute of Information Technology).

Kavindi Perera, Intern — Group Technology (Sri Lanka Institute of Information Technology).

Dumindu Ranasinghearachchi, Lead Engineer.

Nandula Karunasingha, Senior Data Scientist.

Aditha Iddamalgoda, Senior Executive.