Understanding the type of Knowledge Graph you need — Fixed vs Dynamic Schema/Data

Chia Jeng Yang
WhyHow.AI
Published in
5 min readMay 21, 2024

--

The type of graph you need may vary depending on a few things, including the use-case and the general set-up. The 2 things to think are:

  • Is your data dynamic?
  • Is your schema dynamic?

At one end of the spectrum, we can have data and schemas that are fixed and therefore straightforward to build and maintain. On the other end, we can have dynamic data and schemas that require tooling and workflows to keep graphs in sync with a changing, growing corpus of data. Different graph structures represent different data structures for different use-cases.

At first glance, it may seem that one should default to having a system that can accord for dynamic data and dynamic schema as it may be the most flexible and therefore powerful (i.e. the top left). However, this may not necessarily be true. There are certain instances where the underlying data has a range of irrelevant information that one does not wish to have represented in the graph as it may create unnecessary clutter within the knowledge graph, making graph extraction (i.e. the process of extracting data from a graph) more complicated and error prone. WhyHow.AI’s philosophy is to focus only on putting in the work to represent the information that is relevant to the use-case, with our focus on small or minimum viable graphs as part of this philosophy.

There are many scenarios where having more granular tools to define a very specific schema is needed. Having an LLM generate a schema, which while a decent representation of knowledge, may not necessarily reflect what you precisely care about the data, and how you wish to see the data structured in specific ways. These are typically scenarios where the developer has a clear and specific way of how they want the data represented and ultimately retrieved.

Fixed Data / Fixed Schema — Document hierarchies:

  • Effort: One-Off

In this scenario, your data does not change, and the schema does not change. An example of this type of graph is Document Hierarchies. In document hierarchies, you simply want to create a graph that represents the fixed semantic structure of your document so that you can perform deterministic retrieval of raw text. This helps you guarantee that you’re retrieving data from the correct section of a document, something that is difficult to do with semantic similarity searches alone.

Thus, creating graphs of your documents and their unique structure, you can retrieve chunks from specific documents or sections/subsections of a document. In the example of the SAFE document hierarchy, we can use one document schema to create a hierarchy of many SAFE agreements, and we can reliably retrieve from this graph without needing to update or maintain it because the underlying data and structure of the document does not change. The graph created here is a one-off effort per document that is then used to augment information retrieval.

Dynamic Data / Fixed Schema — Schema-defined Knowledge Graph Creation:

  • Effort: Continuous

In this scenario, the data is continuously changing and is continuously being streamed in, but the schema remains fixed. An example of this type of graph is where you have a fixed set of things that you want to extract and store consistently from a range of data (or even multiple types of the same report over time). With a discrete and fixed schema, you can then use the schema to collect and store a specific sub-set of data from a range of unstructured text. Let’s say that you want to be able to extract all the blood types across a hundred patient records. You can set the schema (“blood type”), describe what the schema should look like, and then immediately extract all the blood types and who it belongs to, into a Neo4J knowledge graph.

This is a technique that is relevant for processes demanding memory, or personalization, something that we have written about here.

Thus, the main task to be performed here is setting the schema, pointing the Knowledge Graph creation tool to the right data sources / data extraction models that are to be plugged in, and letting the data stream into the graph over documents, and over time.

Fixed Data / Dynamic Schema — Concept dictionaries / Recursive Retrieval:

  • Effort: Continuous

In this scenario, the data does not change, but the schema is dynamic and changes. An example of this type of graph is where you have a specific set of unstructured text and you want to be able to use LLMs to map out the possible relationships and entities within the text. You may have some idea of some of the potential relationships and entities to capture, but you do not have an exhaustive list.

You can use questions as the basis for iteratively creating schemas to build a graph against a knowledge base. With Question-defined Knowledge Graph creation, the main task is to take in questions asked against the knowledge base, turn the question into a list of potential relationships and entities to capture, and run that list against the unstructured text. As such, the schema and graph may grow and expand over time, even if the underlying text remains static, and an example of what this looks like is this demo against the text of Harry Potter, where multiple questions are thrown against it and the graph is created iteratively over time.

Dynamic Data / Dynamic Schema — (Combination of Understand & Stream):

  • Effort: Continuous

In this scenario, the data may change and the schema may change. An example of this is typically an orchestration of a series of different types of small graphs. For some graphs, the schema may be fixed, for other graphs, the schema may be changing. Being able to manage and orchestrate different types of graphs that are pulling in and structuring different types of information for different purposes requires graph management systems that are tied to multi-agent RAG systems.

Graph management and graph orchestration tools are needed to help with optimizing complex RAG systems.

WhyHow.AI is building tools to help developers bring more determinism and control to their RAG pipelines using graph structures. If you’re thinking about, in the process of, or have already incorporated knowledge graphs in RAG for accuracy, memory and determinism, we’d love to chat at team@whyhow.ai, or follow our newsletter at WhyHow.AI. Join our discussions about rules, determinism and knowledge graphs in RAG on our Discord.

--

--