Graphs Are Everywhere (in GenAI)

Daniel Bukowski
6 min readNov 26, 2023

--

Follow me on LinkedIn for daily posts.

Graphs Are Everywhere

In my role at Neo4j I often often help customers and prospects see their data as a graph. For many use cases such as financial transactions, logistics, and customer journey, graphs represent the real-world nature of the data better than tabular or other forms. Neo4j’s founder and CEO Emil Eifrém puts it more concisely when he says, “graphs are everywhere.”

Graphs are Everywhere: Image generated by author using DALL-E

Based upon my experience, “graphs are everywhere” now includes GenAI, and I have written about the Practical Applications of Grounding an LLM in with a Graph. That article focuses on Retrieval Augmented Generation (RAG), while this article will take a broader approach to why graph databases are uniquely suited for a variety of GenAI applications. And as GenAI applications become more sophisticated, the benefits of pairing them with graph databases will only grow stronger.

(Almost) Every Database is Now a Vector Database

As GenAI and Large Language Models (LLMs) went mainstream during 2023, model fine-tuning and RAG emerged as top approaches for applying the technology to individual or organizational use-cases. Of these, RAG has shown the most practical utility because it augments the LLM’s text generation capabilities with information the model would not otherwise have access to. I advise customers to consider fine-tuning only after they have maximized a RAG implementation.

Prevalence of Vectors: Image generated by author using DALL-E

At the same time, vector databases emerged as the go-to format for storing text vectors used in RAG. In short order nearly every database company has added vector features to their offering along with a version of nearest neighbors matching, making almost every database a vector database. Going forward it may be an anomaly for a database to not have a vector store or vector index feature. GenAI is a transformative technology and vector support gives a database a seat at the table.

Effective GenAI is More Than Just Matching Vectors

Vector databases took off with GenAI in part because they are quick to set up and are effective for simple RAG applications. If all you want to do is match a prompt vector with context vectors, a vector database can do that. But building a production-ready application requires far more than just simple vector matching.

Vector Matching: Image generated by author using DALL-E

Curating High-Quality Context Data

I have written about the importance of having high-quality context data for your RAG application. In a follow-up article, I also described how Neo4j and the Graph Data Science library can help analyze and clean your grounding dataset at scale. Removing low-quality data and ensuring the underlying database is efficient becomes even more critical when the grounding data increases and you are utilizing large 700+ or 1500+ length vectors.

Logging Conversations and Visualizing LLM Behavior

Graph databases also provide unmatched visualization capabilities. These are particularly powerful when visualizing the inner-workings of an LLM-based RAG application.

Example of Logging a RAG LLMConversation in Neo4j with Grounding Data: Image created by author

In the above image, a multi-turn LLM conversation is logged in the same database as the grounding data. We can visualize it to see which pieces of grounding text are used for the initial question and the follow-up question. Such understanding and explainability will become even more important as RAG applications become more complex and enter production, especially in highly-regulated industries.

Implementing Higher-Quality Retrieval

In November 2023 several researchers, including graph expert Dean Allemang, published A Benchmark to Understand the Role of Knowledge Graphs on Large Language Model’s Accuracy for Question Answering on Enterprise SQL Databases. This paper demonstrated, with academic rigor, that incorporating knowledge graphs into a RAG architecture could dramatically increase its accuracy. While this is an initial benchmark, it is a significant step towards demonstrating the power of graphs in RAG.

Ready for What’s Next

GenAI is one of the fastest-evolving technologies ever developed. Shortly after its launch, ChatGPT became the fastest-growing consumer application in history. My colleagues and I have seen this rapid growth in our conversations with customers and prospects as well. Initially conversations started with “how are you thinking about GenAI?” and then quickly shifted to “what can you help us implement with GenAI today?” Now, many of the conversations focus on “what is coming next [in GenAI] and how can you help us implement it?”

The “what is next?” conversation is critical because given the speed GenAI is advancing, “what is next?” may as well be today. Organizations that have been slow to adpot GenAI are already seeing that they are behind and need to catch up with competitors. I suspect “catching up with GenAI” will be a theme as 2024 operating plans and budgets are finalized.

Advanced Graph-Based Retrieval: Imaged generated by author using DALL-E

From a database perspective, the critical question is not “will this database work with my GenAI app today?” As noted above, simple vector databases are adequate for current GenAI applications. The question organizations should be asking is “will this database be able to handle my GenAI application in three or six months?” It will be challenging enough for organizations to catch up with adopting GenAI. The last thing they will want is to migrate databases when they realize their application has more advanced requirements.

Retrieval Augmented Generation (RAG) is a perfect example of this. Vector databases are adequate for storing and matching top-N vectors for a query, but they are limited with logging conversations and providing visibility into the workings of the LLM. Additionally, implementing advanced RAG strategies will add complexity to both queries and logging, requiring a more sophisticated and flexible database (such as a graph).

The Agent and the Graph: Image generated by author using DALL-E

Agents are also emerging as an application of GenAI that will introduce substantial benefits, but also complexity. Bill Gates recently wrote:

Agents are not only going to change how everyone interacts with computers. They’re also going to upend the software industry, bringing about the biggest revolution in computing since we went from typing commands to tapping on icons.

Logging the behavior of Agents as they perform tasks will be critical for visibility, understanding, and accountability as Agent enter production and begin making decisions with real-world impact.

Can Your Database Keep Up?

Graphs are everywhere, including in GenAI. Vector databases are having a moment for simple GenAI applications, but this may just be a moment. Almost every database is adding vector storage and search capabilities, and GenAI implementations are rapidly becoming more sophisticated. Organizations need to plan for “what’s next in GenAI” as if it were today, which means leveraging a database with far more capabilities than just vector storage and search. Graph databases like Neo4j can handle vector search today, but they will truly shine with what’s next in GenAI.

--

--

Daniel Bukowski

Graph Data Science Scientist at Neo4j. I write about the intersection of graphs, graph data science, and GenAI. https://www.linkedin.com/in/danieljbukowski/