The role of Small vs Big Knowledge Graphs

Chia Jeng Yang
WhyHow.AI
Published in
5 min readApr 12, 2024

--

Before LLMs, knowledge graphs were largely used akin to data dictionaries — a means to enforce semantic structure across different terms across different data silos, and to aggregate data-sets to unveil hidden relationships, or to perform relationship mining. Knowledge graphs were used to capture concepts and relationships that would be interesting to track, especially against large unstructured data stores. An example of this would be collecting large bio-pharma academic papers and putting all the information into a knowledge graph to be able to understand hidden relationships between concepts across different papers, especially relationships that required tying relationships across multiple documents (i.e. Paper 1 says X = Y, Paper 2 says Y = Z. A knowledge graph can reveal that X = Z)

With the advanced development of LLMs, we should view knowledge graphs as tools for sharpening semantic focus, not just as aggregators of data. The resultant small graphs do not need to be fully complete because LLMs bring with it their own understanding of semantics. This means that in many cases, it is not necessary to create a perfectly described version of the world. The LLM is able to take different pieces of structured data and add their own understanding on top of it. Thus, structuring data becomes essential only are the point of failure of LLMs, or simply to ensure guardrails against hallucinations. The ability to structure and introduce such logic is a form of context injection, particularly for RAG, but also useful in the future for agent orchestration.

Given these advancements, we categorize graph development into three types: large, small, and hybrid (a combination of large and small) knowledge graphs.

Large Knowledge Graphs

  • Large KGs serve to store comprehensive information on a topic, enabling the understanding of all its interconnected details.
  • Large KGs are helpful for uncovering hidden insights and relationships in your unstructured document that was not otherwise known.

Small Knowledge Graphs

  • Small KGs become important for structuring specific content areas that demand more detailed organization. For example, it is easy to imagine that in a standard RAG process where the LLM is able to accurately retrieve information for most questions, but require specific intervention for more complex questions. Creating a small KG to supplement that retrieval process helps bring more accuracy to the system. We think of document hierarchies, and document structures, and mini-KGs specific to a question as different types of mini-KGs.
  • Small KGs can be generated on a per-question basis and then formed to create an iterative knowledge base that reflects not just the knowledge base you have, but the knowledge base your user cares about. Some of our design partners are interested in gaining control over how their knowledge base grows over time between partners and stakeholders.

Hybrid Large-Small Graphs

  • Hybrid graph models are where multiple information sources may flow into a range of small graphs before ultimately contributing back and being aligned to a large graph. The main use-case for this is where there are multiple data streams contributing to a single large source of context.
  • Multiple data streams may take the form of multiple data contributors (i.e. different departments), or different data sources (Public v Private), different document sources, or different gatekeepers of truth (i.e. different people)
  • Before the information in a specific small graph / data source is adopted as a knowledge base, it can remain in the ‘small-graph’ form for further transformation or evaluation. This can be seen as both a QA process, an ECL process, or a version-control process. Some of our design partners are interested in gaining control over how their knowledge base grows over time from data sources provided by different internal and external stakeholders.

How does this insight change what I am doing?

Frequently, we find people take a singular approach to incorporating a knowledge graph into their process, when alternative approaches may be a better fit.

This singular approach is typically to get an LLM to attempt to describe their entire knowledge base, structure every node and relationship, and then querying against it.

To that end, we ask if that is absolutely necessary. Graph structures may only be needed where the lack of structure begins to overwhelm the LLM. Small or Hybrid KG structures may end up being useful, especially in the beginning, to prove the value of data structures.

This raises a question: Graph structures can organize your data, but what is the minimum viable graph for my specific use-case? For example, the following can be captured in graph structures:

  • Pages
  • Document structures
  • Document summaries
  • Concept nodes linked to vector chunks
An example of concept nodes linked to vector chunks
An example of a document hierarchy ‘small graph’

Capturing higher level representations of your data may be sufficient for your use-case, in a similar way that structured representations of data may not be necessary for querying against a FAQ Q:A set, or in a way that more granular representations of data (i.e. a graphical representation of underlying concepts) may be needed for more complex questions.

Capturing every concept and relationship begs the question of whether you needed to to solve for the question at hand. In many cases, it may be that it is necessary, but when we dive deeper with design partners, that is frequently not the case.

Check out a few tools we are building for small Knowledge Graphs here:

WhyHow.AI has released an SDK for mini-KG creation, as well as tooling for deterministic document structures-based chunk extraction. If you are thinking about, in the process of, or have already incorporated knowledge graphs in RAG, we’d love to chat at team@whyhow.ai, or follow our newsletter at WhyHow.AI. Join our discussions about rules, determinism and knowledge graphs in RAG on our newly-created Discord.

--

--