GraphRAG using LangChain
codes explained with example, Generative AI
GraphRAG has been the talk of the town since Microsoft release their GraphRAG git repo which became an instant hit on git.
Graph RAG is an advanced version of standard RAG that uses Knowledge Bases instead of vector similarity and vector DBs for retrieval from external documents, making the retrieval more comprehensive and wholesome.
I’ve already covered GraphRAG in detail in the below posts:
Graph RAG Crash course is live now:
In this post, I will run through a basic example of how to set GraphRAG using LangChain and use it to improve your RAG systems (using any LLM model or API)
My debut book: LangChain in your Pocket is out now !!
We will be discussing two approaches
1. LLMGraphTransformer
- You 1st need to pip install a few essential libraries
pip install --upgrade --quiet json-repair networkx langchain-core langchain-google-vertexai langchain-experimental langchain-community
#versions used
langchain==0.2.8
langchain-community==0.2.7
langchain-core==0.2.19
langchain-experimental==0.0.62
langchain-google-vertexai==1.0.3
Note: You can skip Google VertexAI and use any other LLM as well
2. Import required functions. Initialize your LLM object & reference text. Use any SOTA LLM for best results as Knowledge Graph creation is a complicated task.
import os
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_google_vertexai import VertexAI
import networkx as nx
from langchain.chains import GraphQAChain
from langchain_core.documents import Document
from langchain_community.graphs.networkx_graph import NetworkxEntityGraph
llm = VertexAI(max_output_tokens=4000,model_name='text-bison-32k')
text = """
Marie Curie, born in 1867, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.
She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.
Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.
She was, in 1906, the first woman to become a professor at the University of Paris.
"""
3. Next, we need to load this text as GraphDocuments and create a GraphTransformer object using the LLM-loaded
documents = [Document(page_content=text)]
llm_transformer = LLMGraphTransformer(llm=llm)
graph_documents = llm_transformer.convert_to_graph_documents(documents)
4. Its time to create the Knowledge Graph. For this, you better provide a list of entities and relationships you wish to extract else LLM might identify everything as an entity or relationship
llm_transformer_filtered = LLMGraphTransformer(
llm=llm,
allowed_nodes=["Person", "Country", "Organization"],
allowed_relationships=["NATIONALITY", "LOCATED_IN", "WORKED_AT", "SPOUSE"],
)
graph_documents_filtered = llm_transformer_filtered.convert_to_graph_documents(
documents
)
As you must have guessed, the above snippet creates
Node = “Person”, “Country”, “Organization”
Relation = [“NATIONALITY”, “LOCATED_IN”, “WORKED_AT”, “SPOUSE”]
Note: Any other potential node or relation would be discarded. If you aren’t sure, you can just pass the LLM object and let the LLM decide
5. We now need to create a Networkx graph and add the above-identified nodes and edges to this graph
graph = NetworkxEntityGraph()
# Add nodes to the graph
for node in graph_documents_filtered[0].nodes:
graph.add_node(node.id)
# Add edges to the graph
for edge in graph_documents_filtered[0].relationships:
graph._graph.add_edge(
edge.source.id,
edge.target.id,
relation=edge.type,
)
6. Let’s create a GraphQAChain now that will help us to interact with the Knowledge Base
chain = GraphQAChain.from_llm(
llm=llm,
graph=graph,
verbose=True
)
7. Call the chain object with your query
question = """Who is Marie Curie?"""
chain.run(question)
Output
We can even use GraphIndexCreator for implementing GraphRAG
2. GraphIndexCreator
Another approach is to use GraphIndexCreator in LangChain which is very similar to the above approach
from langchain.indexes import GraphIndexCreator
from langchain.chains import GraphQAChain
index_creator = GraphIndexCreator(llm=llm)
with open("/home/cdsw/sample.txt") as f:
all_text = f.read()
text = "\n".join(all_text.split("\n\n"))
graph = index_creator.from_text(text)
chain = GraphQAChain.from_llm(llm, graph=graph, verbose=True)
chain.run("What did Pierre Curie won?")
As must have understood
It first create a GraphIndexCreator using an LLM
Reads text from a .txt file
Creates graph using the index creator
Runs the GraphQA chain on the graph similar to above approach
Output
> Entering new GraphQAChain chain...
Entities Extracted:
Pierre Curie
Full Context:
Pierre Curie was a co-winner of Marie Curie's first Nobel Prize
> Finished chain.
' Pierre Curie won the Nobel Prize in Physics in 1903, together with his wife Marie Curie and Henri Becquerel, for their research on radioactivity.'
As I experimented, the LLMGraphTransformer approach looked better compared to GraphIndexCreator in terms of response but yes, both are quite easy to implement. Do remember, that these examples use very small datasets. If you are using a big dataset with paid APIs, be cautious as Knowledge Graph creation can lead a number of hits, costing you dearly.
With this, it’s a wrap. See you soon!