Graphs Are Everywhere (in GenAI)
Follow me on LinkedIn for daily posts.
Graphs Are Everywhere
In my role at Neo4j I often often help customers and prospects see their data as a graph. For many use cases such as financial transactions, logistics, and customer journey, graphs represent the real-world nature of the data better than tabular or other forms. Neo4j’s founder and CEO Emil Eifrém puts it more concisely when he says, “graphs are everywhere.”
Based upon my experience, “graphs are everywhere” now includes GenAI, and I have written about the Practical Applications of Grounding an LLM in with a Graph. That article focuses on Retrieval Augmented Generation (RAG), while this article will take a broader approach to why graph databases are uniquely suited for a variety of GenAI applications. And as GenAI applications become more sophisticated, the benefits of pairing them with graph databases will only grow stronger.
(Almost) Every Database is Now a Vector Database
As GenAI and Large Language Models (LLMs) went mainstream during 2023, model fine-tuning and RAG emerged as top approaches for applying the technology to individual or organizational use-cases. Of these, RAG has shown the most practical utility because it augments the LLM’s text generation capabilities with information the model would not otherwise have access to. I advise customers to consider fine-tuning only after they have maximized a RAG implementation.
At the same time, vector databases emerged as the go-to format for storing text vectors used in RAG. In short order nearly every database company has added vector features to their offering along with a version of nearest neighbors matching, making almost every database a vector database. Going forward it may be an anomaly for a database to not have a vector store or vector index feature. GenAI is a transformative technology and vector support gives a database a seat at the table.
Effective GenAI is More Than Just Matching Vectors
Vector databases took off with GenAI in part because they are quick to set up and are effective for simple RAG applications. If all you want to do is match a prompt vector with context vectors, a vector database can do that. But building a production-ready application requires far more than just simple vector matching.
Curating High-Quality Context Data
I have written about the importance of having high-quality context data for your RAG application. In a follow-up article, I also described how Neo4j and the Graph Data Science library can help analyze and clean your grounding dataset at scale. Removing low-quality data and ensuring the underlying database is efficient becomes even more critical when the grounding data increases and you are utilizing large 700+ or 1500+ length vectors.
Logging Conversations and Visualizing LLM Behavior
Graph databases also provide unmatched visualization capabilities. These are particularly powerful when visualizing the inner-workings of an LLM-based RAG application.
In the above image, a multi-turn LLM conversation is logged in the same database as the grounding data. We can visualize it to see which pieces of grounding text are used for the initial question and the follow-up question. Such understanding and explainability will become even more important as RAG applications become more complex and enter production, especially in highly-regulated industries.
Implementing Higher-Quality Retrieval
In November 2023 several researchers, including graph expert Dean Allemang, published A Benchmark to Understand the Role of Knowledge Graphs on Large Language Model’s Accuracy for Question Answering on Enterprise SQL Databases. This paper demonstrated, with academic rigor, that incorporating knowledge graphs into a RAG architecture could dramatically increase its accuracy. While this is an initial benchmark, it is a significant step towards demonstrating the power of graphs in RAG.
Ready for What’s Next
GenAI is one of the fastest-evolving technologies ever developed. Shortly after its launch, ChatGPT became the fastest-growing consumer application in history. My colleagues and I have seen this rapid growth in our conversations with customers and prospects as well. Initially conversations started with “how are you thinking about GenAI?” and then quickly shifted to “what can you help us implement with GenAI today?” Now, many of the conversations focus on “what is coming next [in GenAI] and how can you help us implement it?”
The “what is next?” conversation is critical because given the speed GenAI is advancing, “what is next?” may as well be today. Organizations that have been slow to adpot GenAI are already seeing that they are behind and need to catch up with competitors. I suspect “catching up with GenAI” will be a theme as 2024 operating plans and budgets are finalized.
From a database perspective, the critical question is not “will this database work with my GenAI app today?” As noted above, simple vector databases are adequate for current GenAI applications. The question organizations should be asking is “will this database be able to handle my GenAI application in three or six months?” It will be challenging enough for organizations to catch up with adopting GenAI. The last thing they will want is to migrate databases when they realize their application has more advanced requirements.
Retrieval Augmented Generation (RAG) is a perfect example of this. Vector databases are adequate for storing and matching top-N vectors for a query, but they are limited with logging conversations and providing visibility into the workings of the LLM. Additionally, implementing advanced RAG strategies will add complexity to both queries and logging, requiring a more sophisticated and flexible database (such as a graph).
Agents are also emerging as an application of GenAI that will introduce substantial benefits, but also complexity. Bill Gates recently wrote:
Agents are not only going to change how everyone interacts with computers. They’re also going to upend the software industry, bringing about the biggest revolution in computing since we went from typing commands to tapping on icons.
Logging the behavior of Agents as they perform tasks will be critical for visibility, understanding, and accountability as Agent enter production and begin making decisions with real-world impact.
Can Your Database Keep Up?
Graphs are everywhere, including in GenAI. Vector databases are having a moment for simple GenAI applications, but this may just be a moment. Almost every database is adding vector storage and search capabilities, and GenAI implementations are rapidly becoming more sophisticated. Organizations need to plan for “what’s next in GenAI” as if it were today, which means leveraging a database with far more capabilities than just vector storage and search. Graph databases like Neo4j can handle vector search today, but they will truly shine with what’s next in GenAI.