Build your hybrid-Graph for RAG & GraphRAG applications using the power of NLP

18 min readMar 5, 2025

What is GraphRAG for you?

What is GraphRAG? What does GraphRAG mean from your perspective? What if you could have a standard RAG and a GraphRAG as a combi-package, with just a query switch?

The fact is, there is no concrete, universally accepted definition of what GraphRAG is — at least, not yet. Based on my experience, literature monitoring, and talking to many people, I would estimate (and apologies to Steven D. Levitt, I know this is not the proper way to present statistics) that:

  • 90% of people associate GraphRAG with Microsoft’s approach to building a graph (or variations of it) and enabling search on it.
  • 8% define GraphRAG as querying an LPG (Labeled Property Graph) or RDF (Resource Description Framework) graph using LLM-generated Cypher queries or text-to-any_graph_language (e.g., Cypher or SPARQL).
  • The remaining 2% are either unsure or exploring different possibilities.

Personally, I’m not entirely convinced by either of the first two definitions, and I’d like to explain why.

First of all, I have to say that I find Microsoft’s GraphRAG to be a very cool idea. In about five years or so, it will likely be widely adopted or even become the prevailing choice among GraphRAG approaches.

However, today, it remains too expensive and impractical for large-scale industrial use. The reality is that most companies lack the time, budget, and confidence to adopt this method. Instead, they are more likely to choose a standard ‘vanilla’ vector database, which is more feasible given current constraints. Confidence — because in fact, there are no thousands of examples of GrapRAG in production out there (probably because of the reasons mentioned above).

The Text-To-Cypher or Text-To-SPARQL technique, in my opinion, is a great alternative to Microsoft GraphRAG (although they can also be used together), and I have seen some excellent examples of its application. However, there are some downsides. First, it requires a significant number of costly LLM calls to generate queries. Second, there is always a layer of uncertainty between you and your knowledge base — you rely on how well you craft your prompt, how effectively the chosen model performs and builds the Cypher or SPARQL query. Additionally, extra processing steps increase response time, and the higher implementation complexity adds to the challenges. In summary, this technique is highly promising and powerful for certain applications, but its suitability depends on the specific use cases.

2. Efficiency Optimization Dilemma

As a consultant and GenAI solutions developer, my goal is to serve GraphRAG at any scale — from small implementations to large-scale, enterprise-level solutions.

Scaling up often comes with trade-offs, particularly in accuracy or efficiency. However, if a lower-complexity and cost-effective solution still delivers satisfactory results, then it is worth keeping in the toolbox, right?

With that in mind, the proposed approach is to leverage the power of graphs for RAG (Retrieval-Augmented Generation) without incurring high costs for graph creation itself. The challenge is to build and maintain a useful graph with minimal dependence on LLMs — or, ideally, by using small, on-premises LLMs instead of costly API calls to large cloud models.

3. Fixed Entity Architecture

Some time ago, I published two Medium articles introducing a new method for building graphs for RAG, called Fixed Entity Architecture [1–2].

The core idea is to construct a layered graph:

  • Layer 1: Ontology Layer — defines the domain ontology. Since ontologies are usually limited in scope, this layer remains fixed or nearly fixed in size.
  • Layer 2: Document Layer — consists of document chunks, similar to what you would find in any vector database. Applying a vector index to this layer and querying it directly would yield a standard vector database search.
  • Layer 3 (Optional): Entity Layer — This layer consists of extracted entities (e.g., using spaCy) from each document chunk. Since these entities often repeat across documents, they serve as a “gluing” layer, enhancing search results.

In both cases, I demonstrated a method to create a graph without relying on LLMs. However, a major challenge with this approach is constructing the ontology layer. Consider the following facts:

  • Not all datasets belong to a well-defined domain.
  • Subject Matter Experts (SMEs) are not always available to assist in building the ontology.

Because of these limitations, I started exploring ways to eliminate the need for a fixed ontology layer.

Why Use Layered Graphs?

Neo4j allows vector indexing on a single internal label. If nodes have different labels, you would need to build separate indexes for each — which is not always practical while performing a vector search.

Surely, in some cases, having a larger number of node types makes sense, e.g., when strict ontology differentiation/filtering is required. However, in my case, this has not been necessary so far. Two to three layers is usually a reasonable number to choose. Hence, the workaround to solve the label indexing limitation is to assign the same internal label to all nodes within a layer while storing the actual labels, names, and metadata as node properties.

4. The power of NLP

How can you extract information from text without relying on your own brain or a trillion-parameter LLM? This is where classical NLP (Natural Language Processing) can become a valuable tool.

Worth mentioning, as I started searching for the best libraries and NLP models, both, before and after the GPT-3.5 era, I was shocked. Many of them — if not all (please correct me if I’m wrong and share any good links or ideas!) — are no longer supported, updated, or maintained. It’s as if they’ve been abandoned and nearly forgotten, which is a real shame because they hold immense potential.

Nevertheless, driven by real-world industry needs and practical constraints, I decided to take on the challenge of exploring an NLP-powered approach. My goal was to build a graph that would enhance the performance of a standard vector database.

A quick note: Now that this technique has been explored to some extent, I strongly encourage readers to experiment further. What I have done so far only scratches the surface of the full potential of NLP-driven graph structures.

5. GraphRAG-s and their potential applications

Before diving into the implementation of an NLP-driven Graph for RAG and discussing the results, I want to first provide my view on different GraphRAG types and their applications.

When I refer to Microsoft GraphRAG, I include not only the original approach published in [3] by Microsoft Research but also various lighter adaptations that have emerged since then, e.g. [4–5].

These approaches typically involve:

  • Extracting entities and relationships from large text corpora using LLMs
  • Summarizing the extracted information using LLMs
  • Allowing users to query summaries and/or community-based summaries

While different implementations exist, the fundamental principle remains the same: using LLMs to construct a knowledge graph from text.

The infographic below (Figure 1) represents my industry-driven perspective on when and why to use different types of graph-based vector search for RAG systems.

First of all, if you are faced with the decision of whether to use a graph or a standard vector database, it is worth looking around — there are some guidelines on when to choose one over the other [6–7].

This infographic applies once you have decided to go for a GraphRAG solution. Here, I highlight key considerations you need to take into account before building your graph.

  1. Data Volume — How much data exists in your knowledge base?
  2. Budget Constraints — How limited is your budget for building the graph?
  3. Ontology Availability:
  • Do you have a clear, structured ontology?
  • Does your knowledge base belong to a fixed domain where a robust ontology layer can be built?
  • Or is your data diverse, distributed, and lacking well-defined domain knowledge?

These factors heavily influence the design, feasibility, and efficiency of your GraphRAG solution.

Figure 1. Decision tree for choosing the suitable graph architecture for a RAG solution.

Once you have answered the three key questions — data volume, budget constraints, and ontology availability — you can determine the suitable GraphRAG approach for your use case.

It’s important to note that Figure 1 does not cover every possible scenario. Some hybrid approaches are also feasible, and the boundaries between techniques are not strictly fixed.

Nevertheless, I observe the following trends: the more data you have, the more carefully you need to evaluate your investment. If you have sufficient budget and require very high accuracy, Microsoft’s solution is a strong choice.

However, if budget constraints are a concern (which is almost always the case), you may need to compromise on accuracy and opt for nearly LLM-free solutions. The best approach in this scenario would be to establish an ontology layer and build a Fixed Entity Architecture Graph.

If you struggle with defining an ontology, lack a deep understanding of your data, or face high data complexity, I recommend constructing an NLP-powered graph. In the following sections, I will demonstrate how you can achieve this.

6. Unlock the power of NLP

Now, let’s roll up our sleeves and build a graph for the cost of a chocolate bar (considering the electricity cost involved).

Technical Setup

For this project, I used:

  • A business laptop with 32GB RAM and a 6GB built-in GPU.
  • Neo4j Community Edition running as a Docker container on WSL (Ubuntu).
  • A dataset of 660 PDF files and a data pre-processing pipeline with some modifications taken from the NVIDIA RAG Blueprint.

6.1 NLP-Powered Graph Approach

As mentioned in the introduction, the NLP-powered Graph is derived from the Fixed Entity Architecture with one key difference — I have dropped the ontology layer.

This means the graph will consist of:

  1. Document Layer — Containing document chunks, similar to a standard vector database
  2. Tokens Layer — Extracted tokens that act as additional connective nodes, improving search performance

By leveraging NLP instead of LLM-heavy processing, this method significantly reduces costs.

6.2 Data pre-processing pipeline

The data preprocessing pipeline follows these key steps:

  1. Chunking — I used pre-written functions from the NVIDIA RAG Blueprint to split documents into smaller segments.
  2. Embedding — Instead of the default NVIDIA approach, I used the Hugging Face model ‘intfloat/e5-base-v2’ for embedding the chunks. This is the only blueprint pre-processing pipeline modification I have mentioned before.
  3. Graph Construction — Once the data was processed, I built the first graph layer in Neo4j, where all chunk nodes were labeled Document.

Below you will find a code example to populate the Neo4j database with the document layer.

def add_chunks_to_db(chunks, doc_name):
prev_node_id = None
for i, chunk in enumerate(chunks):
# Escape single quotes in the chunk content
escaped_chunk = chunk.replace("'", "\\'")

# Create the chunk node
query = f'''
MERGE (d:Document {{
chunkID: "{f"chunk_{i}"}",
docID: "{doc_name.replace("'", "\\'")}",
full_text: '{escaped_chunk}',
embeddings: {embeddings.embed_documents(chunk).tolist()}}}
)
RETURN elementId(d) as id
'''
result = run_query(query)
chunk_node_id = result[0]['id']

# If this is not the first chunk, create a NEXT relationship to the previous chunk
if prev_node_id is not None:
query = f'''
MATCH (c1:Document), (c2:Document)
WHERE elementId(c1) = $prev_node_id AND elementId(c2) = $chunk_node_id
MERGE (c1)-[:NEXT]->(c2)
MERGE (c2)-[:PREV]->(c1)
'''
run_query(driver, query)
prev_node_id = chunk_node_id

Notice that I am building chains of documents here. I am adding the chunks of each document, connected by edges in both directions: one called NEXT, pointing to the next chunk, and the other called PREV, pointing to the previous chunk. As a result, I have a graph that looks like this (see Figure 2):

Figure 2. An example of a Document layer. On this picture, you can see 4 documents.

Here you can recognize 4 PDF’s out of the 660 I have added to the graph. The chain starts with the chunk_0 and end with the chunk_n.

With this first layer, you can easily apply your first vector and text indexes on it, such as:

query = '''
CREATE VECTOR INDEX vector_index_document
IF NOT EXISTS
FOR (d:Document)
ON (d.embeddings)
OPTIONS {indexConfig: {
`vector.dimensions`: 768,
`vector.similarity_function`: 'cosine'
}}
'''

And for the text index:

query = '''
CREATE FULLTEXT INDEX text_index_document FOR (n:Document) ON EACH [n.full_text]
'''

Now one could just use this graph as a standard vector database. You would just do the following:

def pure_rag(query):
my_query_emb = emb.embed_query(query)
query = f"""
CALL db.index.vector.queryNodes('vector_index_document', 10, $user_query_emb)
YIELD node AS vectorNode, score as vectorScore
WITH vectorNode, vectorScore
ORDER BY vectorScore DESC
RETURN elementId(vectorNode), vectorNode.docID, vectorNode.full_text as document_text, vectorScore
LIMIT 10
"""
params = {'my_query': my_query, 'user_query_emb': my_query_emb.tolist()}
results = run_query(query, params)
return pd.DataFrame(data=results)

That’s it! Let’s try it on the NVIDIA dataset and query something based on the data it contains.

Figure 3. RAG test using retrieval from a Document layer.

As the LLM I am using NVIDIA NIM model “meta/llama-3.3–70b-instruct” from Try NVIDIA NIM APIs. Notice that I am not writing any sophisticated prompt, just passing the user question and top 10 retrieved passages.

However, we have built our graph not just for pure standard vector database functionality, right? Let us get more out of it!

6.3. Unlock the power of Graph

Graph adds the semantic reasoning to the data. Even without classical RDF-world semantic reasoning, a graph — by connecting entities — contributes to a deeper understanding of the data. Additionally, I have hypothesized in my previous articles that there is always an element of search asymmetry that can play a certain role. This search asymmetry is also referred to as Magnitude Sensitivity. The dot product is influenced by the magnitude of the vectors, which means it may not reliably represent similarity if the vectors being compared have significantly different magnitudes [9].
After creating the graph with the document layer, we need a way to create the “glue” for the text chunks. We don’t have an ontology, and our assumptions are rather naïve, though unfortunately realistic: we have vast amounts of data, and we do not exactly know what this data is all about, but we want to extract the most value from it. We aim to build a lexical graph that leverages all the advantages of GraphRAG, without spending too much money in the process.

I propose leveraging NLP techniques for that. First, let us extract tokens, bigrams, and trigrams from each of the text chunks. I used the NLP library called sparkNLP, which allows you to harness the power of a local GPU to process large numbers of documents. Below is the code snippet I used for token extraction.

from pyspark.sql import SparkSession
from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp import DocumentAssembler, Finisher
import sparknlp

# Initialize Spark session
spark = sparknlp.start()

# Sample data
# Create DataFrame from the list of documents
data = spark.createDataFrame([(i, doc) for i, doc in enumerate(documents)], ["id", "text"])

# Document Assembler
document_assembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")

# Tokenizer
tokenizer = Tokenizer() \
.setInputCols(["document"]) \
.setOutputCol("token")

# NGram Generator for bigrams
bigram_generator = NGramGenerator() \
.setInputCols(["token"]) \
.setOutputCol("bigrams") \
.setN(2)

# NGram Generator for trigrams
trigram_generator = NGramGenerator() \
.setInputCols(["token"]) \
.setOutputCol("trigrams") \
.setN(3)

# Finisher to convert annotations to string
finisher = Finisher() \
.setInputCols(["bigrams", "trigrams"]) \
.setOutputCols(["finished_bigrams", "finished_trigrams"]) \
.setCleanAnnotations(False)

# Pipeline
pipeline = Pipeline(stages=[
document_assembler,
tokenizer,
bigram_generator,
trigram_generator,
finisher
])

# Fit and transform the data
model = pipeline.fit(data)
result = model.transform(data)

# Show the results
pandas_df = result.select("text", "finished_bigrams", "finished_trigrams").toPandas()
# Stop the Spark session
spark.stop()

After creating the token entities, you can add them to the graph, establishing connections to the chunks from which they were extracted. The approach is simple and robust, and you could again apply the two indexes on this layer, as I demonstrated earlier. By that we have created a second layer with all nodes labeled as “Token.” I included the labels “token,” “bigram,” and “trigram” in the label property, along with the token itself as the name property and associated embeddings. The following examples show the Cypher query for creating token nodes, as well as the query for building the respective vector index:

# create token node
query = """MERGE (t:Token {label: "Token",
name: $token,
embeddings: $token_embeddings
}) RETURN elementId(t) as token_node_id"""

Do it also for bigrams and trigrams.

Next, create the index:

# create vector index on token embeddings
query = '''CREATE VECTOR INDEX vector_index_token IF NOT EXISTS
FOR (n:Token)
ON (n.embeddings)
OPTIONS {indexConfig: {
`vector.dimensions`: 768,
`vector.similarity_function`: 'cosine'
}}

An example of a created bigram node is shown in Figure 4. Note that the entire layer containing tokens, bigrams, and trigrams has an internal label “Token,” allowing the vector index to be applied to all the nodes at once.

Figure 4. Bigram node example
Figure 5. Example of two-layered graph: Documents (blue) and Tokens (orange)

All good so far: we have tokens that are partly shared across different documents, which makes everything interconnected to some extent. However, unfortunately but not surprisingly, the first RAG attempts did not give better results than just doing a pure RAG.

What we need to unlock the full potential of the graph is to connect the entities with each other using context, logic, and semantics. Here’s the challenge: we do not want to rely on GPT or other massive models with trillions of parameters. We already have over 262 thousand nodes in our graph, and using such large models would be an overkill for our “chocolate bar” budget, as it would be.

6.4 Triplets

There are many good open-source models available. However, triple extraction can be a challenging task. The best approach is to use a smaller transformer model and fine-tune it for this specific task. Even better would be to fine-tune it yourself, but for this presentation, I used a pre-trained model from Hugging Face. The bew/t5_sentence_to_triplet_xl model was fine-tuned on an XL version of FLAN-t5-xl. This model is about 600 times smaller than GPT-4, so it fits easily on my computer without any issues. The model was specifically tuned to extract triples from text. According to the model’s owner, Brian Williams, the model is not yet perfect, and yes, the results aren’t always as accurate as I would like, but we’re not aiming for the highest accuracy — just a very good one at minimal cost is sufficient.

I took the extracted text chunks and passed them to the model. The model has created a number of triplets which were consequently mapped to the Token nodes, resulting in more than 650 thousand total edges in the graph.

Figure 6. A subset of Token nodes with the triplet relationships. A predicate is an edge connecting subject node (here: ‘omniverse’) and the object node (here: ‘graphics software’).

Here is a small code snippet of triplets mapping:

def process_triplet(triplet):
subject, predicate, object_ = triplet
subject_emb = embed_query_on_gpu(subject)
predicate_emb = embed_query_on_gpu(predicate)
object_emb = embed_query_on_gpu(object_)
params = {'subject_emb': subject_emb.tolist(),
'predicate_emb': predicate_emb.tolist(),
'object_emb': object_emb.tolist(),
'subject': subject,
'predicate': predicate,
'object': object_}

similarSubjects_query = """
CALL () {
// Search for the subject duplicates
CALL db.index.vector.queryNodes('vector_index_token', 10, $subject_emb)
YIELD node AS vectorNode, score as vectorScore
WITH vectorNode, vectorScore
WHERE vectorScore >= 0.96
RETURN collect(vectorNode) AS similarSubjects
}
WITH similarSubjects
OPTIONAL MATCH (n:Token {name: toLower($subject)})
WITH similarSubjects + CASE WHEN n IS NULL THEN [] ELSE [n] END AS allSubjects
UNWIND allSubjects AS subject
RETURN collect(subject) AS similarSubjects
"""
similarSubjects = run_query(similarSubjects_query, params)[0]['similarSubjects']

similarPredicates_query = """
CALL () {
// Search for the predicate duplicates
CALL db.index.vector.queryNodes('vector_index_token', 10, $predicate_emb)
YIELD node AS vectorNode, score as vectorScore
WITH vectorNode, vectorScore
WHERE vectorScore >= 0.96
RETURN collect(vectorNode) AS similarPredicates
}
WITH similarPredicates
OPTIONAL MATCH (n:Token {name: toLower($predicate)})
WITH similarPredicates + CASE WHEN n IS NULL THEN [] ELSE [n] END AS allPredicates
UNWIND allPredicates AS predicate
RETURN collect(predicate) AS similarPredicates
"""
similarPredicates = run_query(similarPredicates_query, params)[0]['similarPredicates']

similarObjects_query = """
CALL () {
// Search for the object duplicates
CALL db.index.vector.queryNodes('vector_index_token', 10, $object_emb)
YIELD node AS vectorNode, score as vectorScore
WITH vectorNode, vectorScore
WHERE vectorScore >= 0.96
RETURN collect(vectorNode) AS similarObjects
}
WITH similarObjects
OPTIONAL MATCH (n:Token {name: toLower($object)})
WITH similarObjects + CASE WHEN n IS NULL THEN [] ELSE [n] END AS allObjects
UNWIND allObjects AS object
RETURN collect(object) AS similarObjects
"""
similarObjects = run_query(similarObjects_query, params)[0]['similarObjects']

query = """
UNWIND $similarSubjects AS subject
UNWIND $similarPredicates AS predicate
UNWIND $similarObjects AS object
WITH subject.name AS subjectName, predicate.name AS predicateName, object.name AS objectName, subject, predicate, object
MERGE (subjectNode:Token {name: toLower(subjectName)})
ON CREATE SET subjectNode.embeddings = $subject_emb, subjectNode.triplet_part = 'subject'
ON MATCH SET subjectNode.triplet_part = 'subject'
//MERGE (predicateNode:Token {name: toLower(predicateName)})
//ON CREATE SET predicateNode.embeddings = $predicate_emb, predicateNode.triplet_part = 'predicate'
//ON MATCH SET predicateNode.triplet_part = 'predicate'
MERGE (objectNode:Token {name: toLower(objectName)})
ON CREATE SET objectNode.embeddings = $object_emb, objectNode.triplet_part = 'object'
ON MATCH SET objectNode.triplet_part = 'object'
MERGE (subjectNode)-[r:predicate {name: toLower(predicateName)}]->(objectNode)
ON CREATE SET r.label = 'triplet', r.embeddings = $predicate_emb
ON MATCH SET r.label = 'triplet'

RETURN subjectName AS subject, predicateName AS predicate, objectName AS object
"""
final_params = {
'similarSubjects': similarSubjects,
'similarPredicates': similarPredicates,
'similarObjects': similarObjects,
'subject_emb': subject_emb.tolist(),
'predicate_emb': predicate_emb.tolist(),
'object_emb': object_emb.tolist()
}
results = run_query(query, final_params)

print(f"Processed triplet: {triplet}")
return results

7. NLP powered GraphRAG

Figure 7 presents the result of the hybrid RAG/GraphRAG approach on the same question asked using only the Document layer retrieval, representing a pure RAG (Figure 3). The answer is more comprehensive, providing deeper insights into the data.

Note that I did not perform any entity resolution or entity linking, which would definitely be the next steps and most likely result in improved performance. Additionally, for both retrieval tests, I passed exactly 10 retrieved text passages. The GraphRAG takes almost twice as long as the RAG. While we are losing some latency, we are gaining better answer accuracy.

Figure 7. The same question used for the simple RAG test in Figure 3 was also used to test the effectiveness of GraphRAG.

The retrieval function using the triplets’ relationships is given below.

def triplets_driven_retrieval(my_query):
my_query_emb = emb.embed_query(my_query)

query = """
CALL db.index.vector.queryNodes('vector_index_token', 300, $user_query_emb)
YIELD node AS token, score AS tokenScore
CALL (token, tokenScore) {
MATCH (token)
WHERE token.triplet_part IS NOT NULL
OPTIONAL MATCH (token)-[:predicate]->(object)
OPTIONAL MATCH (object)-[:predicate]->(subject)
OPTIONAL MATCH (subject)-[:CONTAINS]->(doc:Document)
RETURN DISTINCT doc, tokenScore as score, 1 AS isTripletPath
ORDER BY tokenScore DESC
LIMIT 200

UNION
MATCH (token)
WHERE token.triplet_part IS NULL
MATCH (token)-[:CONTAINS]-(doc:Document)
RETURN DISTINCT doc, tokenScore as score, 2 AS isTripletPath
ORDER BY tokenScore DESC
LIMIT 200
}
RETURN DISTINCT doc.full_text AS document_text, score, isTripletPath
ORDER BY score DESC
LIMIT 100

UNION

CALL () {
CALL db.index.vector.queryNodes('vector_index_document', 10, $user_query_emb)
YIELD node AS doc, score as vectorScore
WITH doc, vectorScore
ORDER BY vectorScore DESC
RETURN DISTINCT doc,
vectorScore AS score, 3 AS isTripletPath
ORDER BY vectorScore DESC
LIMIT 10
}

RETURN DISTINCT doc.full_text AS document_text, score, isTripletPath
ORDER BY score DESC
LIMIT 10
"""

params = {'user_query_emb': my_query_emb.tolist()}
results = run_query(query, params)
df = pd.DataFrame(data=results)

return df

You can really play with the query logic to traverse your graph in the best way. But let’s take a look at what the GraphRAG Cypher query presented above is doing. The query is built in several steps. First, we match the user query on the Token nodes using the vector index. We check if the token has a property called triplet_part (which are tokens mapped from generated triplets). When we traverse the triplet and reach the subject node, we take all object nodes pointing to it and pick all the document chunks attached to these nodes, ordering and limiting the search. If the token does not have a triplet pair, we simply traverse to its chunk. In the second part of the query, we perform a standard RAG search and pick the documents using the vector index.

I am sure the query can be further optimized. As a side note, I also used spaCy’s named entity extraction, extracting token classification labels like ORG, DATE, etc (see the red nodes in the header image). However, the results were not very good, so I stuck with a two-layer architecture.

It is interesting to see how the subgraph of a Cypher query looks like taking the simple user question: ‘Which companies are mentioned in the system?’ (Figure 8).

Figure 8. A representation of a sub-graph of the retrieved nodes for a user given question ‘Which companies are mentioned in the system?’.

The result consists of two distinct parts: the retrieved, mostly disjointed text chunks from the standard RAG part and a set of nodes with the main “subject” triplet node, in this case: “companies.” This representation can help significantly in optimization stage of the retrieval query.

8. Discussion

The presented approach offers the possibility to create a standard RAG functionality graph and enhance it with the “power of a graph”, i.e., to use the semantics of its content, traverse entity relationships, and retrieve information in a wide variety of ways that you define yourself. The literature shows that each technique, RAG and GraphRAG, performs better for certain tasks [6–7]. This application was primarily designed to showcase hybrid-GraphRAG, combining classical RAG with GraphRAG, but can also be used as a standard RAG approach.

As an idea, dedicated questions asking for specific facts could be filtered out by agents later on, and a classical RAG query can be performed, bypassing the first Cypher part shown above. Questions requiring multi-hop reasoning or some overarching context could be redirected to the GraphRAG world. All of this is possible but not mandatory; you can always choose one or the other.

The bottom line is that the NLP-driven architecture presented above gives you the flexibility to choose your RAG approach and opens new horizons for RAG solutions.

9. Conclusion

In summary, this article presents an NLP-driven approach to building a knowledge graph that performs a hybrid RAG/GraphRAG for RAG applications without heavy reliance on LLMs. The approach involves a layered graph without the need to include a fixed ontology.

The initial results show that questions answered using this hybrid retrieval yield more comprehensive and insightful answers, opening up the possibilities for further exploration and potential implementation in large-scale GenAI projects.

PS! Thank you for making it this far, and stay tuned for a sequel, where we will dive into performance optimization and more.

References

1. RAG on Graph using Fixed Entity Architecture: make you retrieval work for you | by Irina Adamchic | Medium

2. Three-Layer Fixed Entity Architecture for Efficient RAG on Graphs | by Irina Adamchic | Medium

3. [2404.16130] From Local to Global: A Graph RAG Approach to Query-Focused Summarization

4. LightRAG: Simple and Fast Alternative to GraphRAG

5. LazyGraphRAG: Setting a new standard for quality and cost — Microsoft Research

6. [2502.11371] RAG vs. GraphRAG: A Systematic Evaluation and Key Insights

7. Vector RAG vs Graph RAG vs LightRAG | TDG | Technology Development Group

8. https://github.com/NVIDIA-AI-Blueprints/rag/tree/v1.0.0

9. Vector Search For AI — Part 1 — Vector Similarity Search Algorithms | by Serkan Özal | Medium

10. Spark NLP — Models Hub

11. bew/t5_sentence_to_triplet_xl · Hugging Face

--

--

Irina Adamchic
Irina Adamchic

Responses (5)