Powering our chatbot with a knowledge graph

Martin Arroyo
99P Labs
Published in
6 min readApr 17, 2024
The full 99GPT Knowledge Graph

Previous article and the chatbot project

In an earlier post, we introduced the 99GPT Project — our embedded chatbot designed to answer questions about our blog — and discussed our process of building it from end-to-end. As a recap, we built our chat bot over our own blogs as part of our research into creating software leveraging artificial intelligence and the practical applications of generative AI. We used LlamaIndex for our development framework and OpenAI’s GPT 3.5 Turbo and Ada-002 models for generation and embedding, respectively, and we stored our vector embeddings using Weaviate.

Much of our focus was on data ingestion and pre-processing, as it is a critical step in the process. We explored different methods of breaking our documents up into text chunks that could then be embedded and vectorized. This step is crucial, particularly when utilizing a vector index, because it impacts the quality of answers returned for different question types. After careful evaluation, we chose the embedding model and parameters that yielded the best overall performance in our initial benchmarks.

Issues with responses on previous project

While our vector-based chatbot does well on some types of questions, such as basic summary and more specific questions directly related to content in the blog. However, it does poorly on multi-hop questions, which involves synthesizing information from multiple sources or requires multiple steps of reasoning to arrive at an answer. We want our chatbot to be able to answer questions that may require information from more than one blog post, author, or concept. Our current iteration is hit-or-miss when it comes to answering these types of questions.

Additionally, we sometimes encounter issues with answering simple factual questions such as, “Are there any blogs written by <author>?” This issue occurs even after adding metadata from each blog to each chunk to identify authors and blog titles.

A possible solution — knowledge graphs

In searching for solutions to help with resolving these issues and improving our chatbot, we thought it would be worth it to explore creating a knowledge graph index as an alternative to a vector store. Based on our research, a knowledge graph created from our blogs would potentially help us establish relationships between entities across multiple documents. There has a lot of buzz in the space lately around knowledge graphs as well, so we figured it would be a worthy pursuit to see if we can leverage them in our project.

Knowledge graphs are a way of organizing and representing information that emphasizes the connections between different pieces of data. You can think of them as an intricate web, where each node (or dot) represents a piece of information, such as a person, place, or concept, and the lines (edges) connecting these nodes represent the relationships between them. For example, a knowledge graph might link an author to their published works, those works to their themes, and those themes to related topics.

This structured format not only makes it easier to visualize how different pieces of information are related, but it also allows systems, like our chatbot, to navigate and utilize this network of data more effectively. By understanding the relationships and connections between various entities, a knowledge graph should enhance the chatbot’s ability to synthesize information from multiple sources and provide more accurate and relevant responses.

How are they created?

Here is a basic overview of how a knowledge graph can be generated

Creating a knowledge graph generally involves several key steps: data collection, entity recognition, relationship extraction, and graph construction. Once you have collected your data, the next step involves entity recognition, where specific information units like names, dates, and locations are identified within the text. These entities serve as the nodes of the graph. Following this, relationship extraction occurs, which involves analyzing how these entities are connected based on the context provided in the data. This step is crucial as it defines the edges of the graph, representing various types of relationships such as “written by,” “located in,” or “created by.” Finally, in the graph construction phase, the identified entities and their relationships are structured into a graph format.

Knowledge graphs can be created manually using tools like the Cypher query language, or automatically generated using large language models and frameworks like LlamaIndex by harnessing the models’ ability to understand and extract relationships from large volumes of text. Large language models are used to identify and categorize entities like people, places, and concepts as well as the relationships between them. When integrated with a framework like LlamaIndex, these models can be directed to process specific texts — such as a collection of blog posts or corporate documents — to automatically detect and construct the nodes and edges that make up a knowledge graph.

This process begins by the model reading and interpreting the text to pinpoint key entities and discern their interactions and associations. Following this, LlamaIndex aids in organizing these entities and relationships into a structured graph format that is both searchable and analyzable. The result is a dynamic and scalable knowledge graph that evolves as new information is processed. This automated generation is highly efficient and scalable, making it possible to continually update and refine the knowledge graph with minimal manual intervention.

When are they best used?

Knowledge graphs are particularly useful in scenarios where relationships between data points play a crucial role in understanding and generating insights. They can be especially useful in environments where the ability to cross-reference interconnected information can enhance decision-making, search accuracy, or recommendations.

For example, in industries like healthcare, where patient histories, symptoms, treatments, and outcomes are interconnected, knowledge graphs can provide comprehensive insights that improve diagnoses and treatments. If your data is naturally interlinked in some way, then using a knowledge graph could be an ideal solution.

How did we implement it?

We initially kept the same tech stack that was used in the first iteration of the chatbot: LlamaIndex for orchestration, OpenAI for our generative models, and Weaviate for storage. While Weaviate does mention that they support knowledge graph creation and storage, it wasn’t entirely clear how to do this and LlamaIndex currently does not support this. This is when we decided to pivot on storage and check out Neo4J, which we ultimately decided to use.

LlamaIndex has several abstractions available that make generating a knowledge graph index straightforward, particularly their KnowledgeGraphIndex and KnowledgeGraphRAGRetriever API, which are used to generate and query the graph, respectively. We used GPT 3.5 Turbo to help identify the entities, extract the relationships, and ultimately generate the Cypher queries to create the graph.

A portion of the 99GPT Knowledge Graph that illustrates the nodes and relationships identified in our blogs

Did knowledge graphs improve performance?

After dealing with some development snafus, such as a change to LlamaIndex that temporarily broke our application and navigating sparse documentation (generating simple knowledge graphs are well documented — integrating them into a chat application is not.) Once we were able to test the vector-based chatbot versus the knowledge-graph-based chatbot, we found the results to be disappointing.

The knowledge-graph-based chatbot did not seem to pick up on the relationships between blogs as we expected it to. It wasn’t much better at answering questions and did not seem to do well in multi-hop scenarios. In this case, our vector-based chatbot actually performs better.

We would like to caveat this by saying that our experience with working with knowledge graphs is still new, so there may be steps that we can take to remedy these issues that we are just unaware of yet (we welcome any feedback or suggestions as to what we can try.) That said, the knowledge graph that we generated using LlamaIndex and GPT 3.5 Turbo did not perform as well as we hoped it would.

Next Steps

Now that we have explored more indexing options, we feel that understanding how to better synthesize the responses sent back from the index may help yield better results in the future. We have found that a single index can be limiting, especially when you want your application to answer complex questions. Therefore, we will likely be exploring concepts such as index composition (querying multiple indexes to generate an answer) and multi-document agents.

--

--

Martin Arroyo
99P Labs

Applied AI Research Engineer @ 99P Labs | Data Analytics Instructor @ COOP Careers