The Practical Benefits to Grounding an LLM in a Knowledge Graph

Daniel Bukowski
8 min readSep 18, 2023

Note: This article and the underlying LLM application were developed with Alexander Gilmore, Associate Consulting Engineer at Neo4j. Follow me on LinkedIn for daily posts.

Summary

Knowledge Graphs combined with Graph Data Science (GDS) algorithms offer unique benefits for grounding applications built on large language models (LLMs). The insights made possible by a knowledge graph and GDS would be difficult to obtain otherwise and highlight the impact graphs and GDS can have on understanding and improving LLM performance, specifically:

  • Understanding and evaluating grounding context documents
  • Logging conversations in the same database as the context documents
  • Understanding LLM performance by visualizing conversation chains as graphs
  • Evaluating context documents and conversations with GDS algorithms

To learn more about how Knowledge Graphs and GDS can improve your grounded LLM application, please join us for our virtual Nodes 2023 Workshop: Graph Data Science with Generative AI on October 5, 2023.

Grounded LLM Applications Today

Retrieval Augmented Generation (RAG) is an emerging approach to improving the performance of LLMs. RAG aims to improve LLM outputs by enhancing the user’s question with information that is either not public or otherwise exists outside of the LLM’s training data.

Today, LLM applications are often grounded using vector databases. When a user submits a question to the LLM, a vector embedding of that question is then used to retrieve relevant contextual information intended to assist the LLM in providing a more accurate response. However, a simple vector search can return duplicated information or miss highly-relevant information, particularly with more complex use cases. Tomaz Bratanic detailed several limits of vector search in his blog about multi-hop question answering.

Enter Graph Databases

You might not think of graph databases as ways to ground an LLM, but as we will demonstrate they can replicate, and likely surpass, the grounding capabilities of vector databases.

To start, vector search can now be performed in Neo4j knowledge graphs, providing similar retrieval to vector databases in addition to several other benefits. But as noted above, there are many additional potential benefits to using a graph database and GDS algorithms.

To understand and demonstrate these, my colleague Alex and I developed an internal chat agent to help our colleagues answer questions about Neo4j and the GDS library. Our initial grounding documentation comprised approximately 15,000 text chunks from public Neo4j documentation and the Neo4j Developer Blog. While developing and testing this application, we were able to practically demonstrate several additional benefits that a knowledge graph, combined with GDS algorithms, can provide to a grounded LLM.

Visualizing Similarity Relationships

Graphs have the unique ability to visualize similarity relationships among data points. Here, we applied K-Nearest Neighbors (KNN) to the text embeddings of each document chunk and then persisted those as SIMILARITY relationships in the graph.

Visualizing similarity relationships between document chunks (teal color) and their sources (orange). Image created by authors.

In the above image we see the similarity of document chunks (teal) that originate from two distinct source documents (URLs) are visualized as orange nodes in the knowledge graph. In this one visualization we can see:

  • Document chunk relationships to sources, in this case URLs
  • Document similarity relationships to each other based upon text embeddings

Such analysis can be important when evaluating the entire corpus of context used to ground an LLM. Document chunks that are highly similar may be candidates to be removed or combined, minimizing the potential context overlap when asking a question to the LLM.

Graph-Based Clustering

Once we generate similarity relationships among the document nodes, we can then apply additional GDS algorithms to understand natural groupings of these documents. In this case we used Label Propagation (LPA) which is a graph community detection algorithm that works well on highly-connected graphs. Additionally, we applied PageRank, weighted by similarity score, as a proxy for each document’s importance within the overall graph.

When combined with traditional NLP analysis like text length, word count, and word length, the LPA communities combined with the PageRank scores begin to give us an understanding of the different document communities.

Table of 10 Largest Document communities based upon count, along with median Text Length, Word Count, Average Word Length, and PageRank Score for each community. Image created by authors.

When we sort the table to identify the communities with the highest median PageRank score, we see much smaller and much different communities.

Document communities with the highest median PageRank score. Image created by authors.

In the above table we can see that community 14015 has the highest median PageRank score, along with a rather high average median word length (med_avgWordLen). When we visualize community 14015 in the graph we can see that it is an densely connected community, indicating that the text chunks are highly similar to each other.

Visualization of the Document community with the highest median PageRank score of any community in the graph. Image created by authors.

When we view the text chunks that the nodes represent, we can also see that they are similar examples of code from the Awesome Procedures on Cypher (APOC) library. This and similar communities in our graph could be candidates for a different chunking strategy to better help differentiate them from each other and provide better context to the LLM.

Visualizing the Document Similarity Graph with Node Embeddings

In addition to zooming in on specific clusters, we also wanted to see the overall distribution of context documents across the similarity graph. To accomplish this we used the FastRP node embedding algorithm generate embeddings of the similarity graph itself, incorporating similarity scores on each relationship. We then plotted the results, shown below:

Two-dimensional plot of FastRP node embeddings generated from the context Document similarity graph. Image created by authors.

Each point on the plot represents a Document node in our knowledge graph. The colors represent the community Document node belongs to. Note: Our context document graph contains 160 communities, but the labels are represented as integers which affects the scale on the plot. Each Document node’s size in the plot represents its PageRank score.

Examining the graph above, we appear to have well separated clusters, indicating that the communities derived from our document similarity graph are both unique and distinct. Where there is overlap, we can further explore those documents to determine if they should be eliminated, consolidated, or approached with a different chunking strategy.

Logging Conversations in the Knowledge Graph

While analyzing context documents is beneficial, logging user interactions with the LLM in the same knowledge graph allows for rich understanding and analysis of model performance.

Graph data model of how conversations are logged in the knowledge graph. Image created by authors.

The graph data model above depicts how our application logs conversations in the same graph database that we used for the context documents:

  • Session: Session nodes indicate when a user starts a session with our application by navigating to the web interface.
  • Conversation: A Session will have one or more Conversations, which are defined when the user makes a change such as switching LLMs (our application currently supports four foundation models) or clearing the conversation memory.
  • Message: Each conversation will have a FIRST message, which includes the initial question or statement the user makes to the LLM. From there, conversations “chain” via NEXT relationships, incorporating prior knowledge and context.
  • Document: Each user message will be associated with zero to ten pieces of context, based upon user preference, identified in the graph as Document nodes.

Visualizing Conversation Chains

The image below shows an actual conversation chain in our application.

An actual user session logged in the knowledge graph. Image created by authors.

The above image demonstrates how sessions and conversations are logged, moving left-to-right:

  • The first node (green), represents the user session.
  • The next node (orange) indicates that the user session contains a single conversation with the GPT-4 8K model.
  • The next four inline nodes (purple) represent the user questions and assistant responses.
  • The surrounding nodes (teal) represent 10 unique pieces of context associated with each user question to the LLM. As the graph also shows, five common pieces of context were retrieved for the two questions. The Document labels (i.e., 13065) also show that all but one piece of context in this conversation originates from the same community of documents.

Logging the conversation chains in graph format, and in the same database as the context, provides numerous benefits for analyzing, understanding, and evaluating LLM performance:

  • Explicitly identifying relationships among database elements. Often in databases relationships are implicit, but with graph databases relationships are treated equally with all other data which enables richer analysis.
  • Visualizing conversations, which can substantially improve understanding of model performance. This will likely be important as LLM-based agents are deployed within highly-regulated industries.
  • Analyzing context use across conversations, which can help understand which pieces of context are most frequently used, which pieces are not used, and which pieces contribute to positive or negative-rated responses. This can allow for optimization of the knowledge graph to best suit the LLM by condensing similar documents or eliminating junk documents and as a whole it allows for richer understanding of the LLM behavior.

Context Document Frequency

A natural benefit of combining context documents with conversation logging in the same graph database is the ability to analyze which context documents are most frequently provided to LLMs. In the image below, the message nodes (purple) are shown in relation to the most-frequently used context documents (teal).

User messages associated with the most frequently used context documents. Image created by authors.

Querying for the five most-frequently used pieces of context, we can see the text being provided to the LLMs with the user questions.

Most frequently-used context documents. Image created by authors.

As the number of conversations with our model grows, this knowledge and understanding of how the context is used will enable us to manage the context in a way that enhances how the LLMs answer user questions efficiently and effectively.

Improving Model Performance

As we used our application we started to see certain behaviors. One of which included a gradual increase in response time as the conversation extended. Analyzing the response times based upon logging in our knowledge graph we saw that the response time doubled nearly every six responses. By having the context in the same graph as our conversation logs, we can also evaluate the impact of larger or smaller context documents on response time as well.

LLM responses in the same conversation are linked (blue) and the average response time for each message number in a conversation (red)

Conclusion

Graph databases are provide unique, high-impact benefits when used to ground LLM-based applications. Combining context and conversations, visualizing LLM behavior, and applying GDS algorithms to generate insights are benefits that would be difficult to obtain in other ways.

In our day-to-day work at Neo4j, we see every day how graph databases assist customers with some of their most challenging and impactful work. By building an LLM-based application grounded in a graph databases, we were able to experience firsthand the benefits graphs can provide to this emerging technology.

To learn more about how Knowledge Graphs and Graph Data Science can improve your grounded LLM application, please join us for our virtual Nodes 2023 Workshop: Graph Data Science with Generative AI on October 5, 2023 where we will demonstrate these and more best practices we learned firsthand while building our own application.

--

--

Daniel Bukowski

Graph Data Science Scientist at Neo4j. I write about the intersection of graphs, graph data science, and GenAI. https://www.linkedin.com/in/danieljbukowski/