Microsoft GraphRAG with an RDF Knowledge Graph — Part 3
Using SPARQL and the Knowledge Graph for RAG
Introduction
If you’ve been following this series then at this point you should have a populated Knowledge Graph in an RDF Store (I’ve been using GraphDB). The first two steps have been:
- Part 1 — Using a local LLM & Encoder to do Microsoft’s GraphRAG
- Part 2 — Uploading the output from Microsoft’s GraphRAG into an RDF Store
In this final part we’re going to do the following:
- Encode our question into an Embedding Vector and do a search to find the 10 nearest
Entity
records to that vector. - We then find the
Chunk
records that are associated with theseEntity
records and return the top 3 - We also find the
Community
records associated with theseEntity
records and return the top 3 - We also find the inside and outside relationships of these
Entity
records - We also fetch the description of each
Entity
- Having got all that information we feed that as context to our LLM together with our question
Encoding Our Question
To encode our question we will use our embedding endpoint provided by LM Server. This is an OpenAI endpoint. LM Server really helpfully provides code to show you how to connect to their server:
embedding_vector
will now be a vector of floating point values the same size are your encoding model. The encoding model I chose, bge-large-en-v1.5-gguf
, creates a vector of size 1024. It is vitally important that you use the same encoding model for this step as you did in Part 1, otherwise your searches in vector space aren’t going to work.
To search for our 10 nearest matching Entity
records we are going to use the same Elasticsearch we created and used in Part 2. Assuming you have this up and running on your machine then you would do this as follows:
Now our entity_list
contains a list of the top 10 Entity
records` ids.
Get the top 3 Chunk records
Using this entity_list
we are going to query our Knowledge Graph and get the top 3 Chunk
records that are linked to these Entity
records. For each identified Chunk
we count the number of Entity
records from our list that are linked and then order them by that count in a descending order:
Get the Top 3 Community Records
Again, using the entity_list
we query our Knowledge Graph and get the corresponding Community
records that are linked to these Entity
records. We sort the Community
records by rank and weight and take the top 3:
Get the Inside and Outside Relationships
We query our Knowledge Graph using the entity_list
and the related_to
relationship to find our inside and outside relationships:
And for the inside relationships:
Fetch the Entity Descriptions
We also fetch the Entity
descriptions:
Feeding Context to the LLM
Having got all these DataFrames we want to convert each one into a format that we can then feed into our LLM. Here’s an example of that conversion with the entity_df
that we obtained before:
Create LangChain Response
Everything is now ready for us to setup an LLM chat with our local LM Server instance. LM Server once again provides us some helpful example code to connect to the server:
We’re actually going to be using LangChain to chain together our chat model and response:
First let’s prompt with no context from our Knowledge Graph:
Here’s the text I got back from my model:
I think there may be some confusion here!
In Charles Dickens’ classic novel “A Christmas Carol”, Bob Cratchit’s wife is actually named Emily, not Belinda.
Bob Cratchit is a kind and hardworking clerk who works for Ebenezer Scrooge. He is the father of six children: Peter, Belle (not Belinda), Tiny Tim, and three other unnamed children.
Interesting! I deliberately chose a question about a full name that I knew was not in the text, but that could be worked out by looking at the relationships. It’s clearly failing here.
Now let’s use the information we’ve obtained from our Knowledge Graph and put that into the context and ask the same question:
Here’s the text I got back this time:
According to the data provided, Belinda Cratchit is a daughter of Bob Cratchit. They are related as parent and child. Additionally, it is mentioned that Mrs. Cratchit (Bob’s wife) assists Belinda with household tasks such as managing the cloth, indicating a close family relationship between them.
Nice! It clearly knows the book and has been able to work out the relationships from the information we provided from our Knowledge Graph.
Summary
I’ve enjoyed going on this journey of discovery with Microsoft’s GraphRAG. I was able to run it on my local PC together with an LLM, Vector Embedding, Vector Index and RDF Store. I think the output responses I was able to get with my Chat show how much better this approach is than just simple RAG.
Next time I’m planning on blogging about Microsoft’s GraphRAG Accelerator project which I was able to get up and running in the cloud and which also produces good results. Watch this space :-)
Resources
- All the code related to this series is on my GitHub: https://github.com/ianormy/msft_graphrag_blog
- Elasticsearch: https://www.elastic.co/downloads/elasticsearch
- LM Studio: https://lmstudio.ai/
- LangChain: https://www.langchain.com/
- GraphDB: https://graphdb.ontotext.com/
- Microsoft GraphRAG: https://microsoft.github.io/graphrag/
- Tomaz Bratanic has done some blogs and also created some notebooks about Microsoft GraphRAG that are extremely good and informative. Although they’re geared towards Neo4j and use CYPHER, they have inspired me and I have used a lot of his ideas. You can learn a lot from them — https://github.com/tomasonjo/blogs/tree/master/msft_graphrag