Modular vs Monolithic: Small Graphs as Micro-services

Published in

WhyHow.AI

8 min readJul 17, 2024

One key distinction of WhyHow.AI’s approach to knowledge graph creation is a focus on the creation of small graphs that are scoped in and use-case specific. We write about different types of small graphs created through scoped in schemas in financial use-cases, healthcare use-cases, and with a multi-industry schema library here.

There are a few reasons why a single monolithic large graph can be sub-optimal:

Ease of Construction & Scaling
Ease of Orchestration & Retrieval
Ease of Maintenance & Debugging

What is the difference between a large and a small graph?

The difference is not one of file size, but rather the completeness of the underlying unstructured knowledge base being fully represented in the graph.

The small graph vs large graph distinction is not a technological one, but an architecture and operational process. It contrasts with large graph approaches or philosophies that from the very beginning, aims to ingest every single document at the point of graph construction, or aim to ingest every single relationship and entity within the document, without regard to the specific use-case in question.

The basic premise here is that a smaller structured representation of your data, as opposed to a complete structured representation of all data, is more efficient, more accurate, and faster to set up and scale. It also reflects the nature of building RAG systems, which is that we want to iteratively understand the questions and data we want to answer and retrieve. This is as opposed to monolithic black box architectures that attempt to ingest and answer all data and questions at once, sometimes irrespective of domain or department. Although on the surface level seemingly more attractive, this creates issues when trying to scale, debug, and improve such systems. This becomes more evident when we think about schemas as a way to structure our unstructured data to focus on the things we care about more, and for specific use-cases. While viewing the schema as such, large graphs become unwiedly as it attempts to balance knowledge representation of the entire unstructured text with granular representation of the information that you care about.

Each small graph created is a structure of knowledge that can represent an independently set of features of your data, that do not initially overlap. There are two ways to think about the comprehensiveness of the data entering the graph:

The total percentage of documents that are represented in the graph
The total percentage of information within a single document that is represent in the graph

Ease of Construction & Scaling

An example of a small graph process would be defining a schema that only represents specific entities and relationships that you care about within the unstructured text. We write more about how and why granular control of the schema is important in this article. In short, in practice many texts contain a lot of information that is fundamentally not relevant to the workflows you are looking to build. You simply only care about specific types of information in the broader text, which you can classify within a schema to capture. These specific types of information reflect the bounded range of questions you would typically ask or look for within the text itself. In many cases, attempting to capture all the underlying information within a knowledge graph begs the question of why you want to, and whether the trouble of attempting to structure the long-tail of ancillary information in the text is worth it. While there may be certain use-cases where large, unbounded schemas are necessary (something we will be releasing an article for), in many instances, it may reflect a lack of clear focus of the important information and questions that are relevant to specific RAG workflows.

Another example of a small graph process would be focusing on singular documents for graph construction. Similarly, we can imagine that a RAG system of the marketing team, and the RAG system of the finance team would rely on separate knowledge bases, with very little practical overlap. Attempting to create schemas and graphs that enable perfect representation of both domains in a single graph is difficult, potentially linguistically impossible, and has little benefit.

In the context of enterprise RAG systems, small graphs also enable quick wins. Small graphs of specific questions or specific documents enable POCs to be built quickly to demonstrate the value of being able to use KG RAG systems.

Creating a graph can require oversight by a domain expert to help confirm the relevance of the relationships being recorded in the graph. This frequently means that large complex graphs would require larger teams and multidisciplinary involvement. In contrast, with smaller, scoped in graphs, each small graph can be worked on in parallel, given the lack of complex relationships that are required to be mapped between the data points between the graphs. This allows for the ability to scale quickly with smaller graphs.

As previously discussed, building graphs with scoped in schemas or graphs of specific documents can enable much faster, and accurate granular graphs for KG RAG systems.

Ease of Orchestration & Querying/ Retrieval

A large graph is something that becomes programmatically difficult to manipulate within a RAG system.

Many of our design partners have tried experimenting with large graphs and have found the complexity of graph querying rises exponentially with the size of the graph, given the larger surface area of complexity (i.e. rising number of possible combinations of entities and relationships that the LLM may misinterpret to be relevant). The complexity of extracting precisely and only what you intend to extract from a graph which contains a lot of potentially unrelated connections and relationships would naturally be cumbersome.

This is a reverse ETL issue that is solvable (and can be how existing large graphs can be repurposed to feed into small-graph multi-agent architecture), but infrastructure currently is missing here (stay tuned!). In the meantime, small graphs can be used akin to microservices — structured knowledge bases that are called upon different points in the RAG process for different use-cases.

Ease of Maintenance & Debugging

As each graph is constructed differently and acts independently from each other, it becomes easier to maintain and debug small graph systems. Within a KG RAG system, an answer may be inaccurate for a range of different reasons — the information is not in the graph, the schema is inaccurate, the graph query engine is not functioning as intended, etc.

Just as using a monolithic single large prompt makes it difficult to debug and understand if the output is not as desired, a single large graph makes it incredibly difficult to amend and inject your own opinion about the type of information that should be retrieved. In contrast, modular KG RAG systems employing multi-agent systems, in conjunction with small graphs pulled in where necessary to inject context programmatically will be the future. A failure with specific graphs can be easily identified and debugged.

Ontology Coreferencing & Scaling Small Graphs

One question we get about small graphs is about how to think about merging small graphs over time, and the issue of ontology co-referencing, which is the issue of how to coherently merge different ontologies in different graphs to ensure that information is properly represented.

To a large extent, we believe that if a large graph is needed, ontology coreference, just like graph creation, can be more easily automatable with LLMs. Historical issues with ontology coreference involved propagating changes across a graph, ensuring that the large ontology can balance defining relationships across multiple domains well so that analytics performed against the graph can be done effectively, dealing with the non-technical issues like politics of aligning schemas between business units and priorities for a singular source of truth for the whole company, etc. Many of these issues are not as pertinent when it comes to KG RAG, given the difference between a monolithic data fabric layer for the purposes of graph analytics, and structured representation for the purposes of information retrieval.

Furthermore, the historical problem with large graphs and large ontologies is that because of the general nature of the ontology, an abstract, high-level schema is used that fails to capture a lot of specific information that you originally want to capture. By starting out with small graphs, and then choosing to merge it into a larger graph, you can begin with a granular representation and then work iteratively towards a large graph that can represent your information. In the meantime, you do not sacrifice performance since the small graphs can adequately answer most questions being thrown at it, and these small graphs can be supported agentically through a multi-agent system.

In the realm of KG RAG, the main reason people believe they need a large graph instead of querying from a set of smaller graphs is because of multi-hop retrieval. This is where certain questions may only be answered because they require traversing multiple nodes to get the full context for an answer.

In practice, we have found that multi-hop retrieval is much less common than people expect since the vast majority of questions only require single hops. Regardless, LLMs and agentic systems have opened a range of new ways that multi-hop retrieval can be done, without necessarily resorting to a large graph. Such methods include:

Query deconstruction and multi-agent/multi-graph information retrieval
Intent classification leading to hard-coded/few-shot Cypher/multi-hop query
Temporary graph creation from underlying triples

Small Graphs as Microservices for Structured Context

With smaller sub-graphs that are independently managed, and tied to a single agent each, we then have the ability to allow for the manipulation of knowledge on a per concept, or per sub-graph level. Such a framework then provides for more deterministic and accurate retrieval, acting as a micro-service layer for contextual structured knowledge.

The image below shows a multi-agent system, with each agent representing a specific step in the information processing stages for a Hospital Onboarding process. By breaking down the process agentically, and having small scoped in graphs about patient records, doctor type & availability, and available beds, each agent does not need to perform a complicated data extraction process from a large all encompassing ‘hospital graph’. Since the graph is scoped in, the surface area for hallucinations is smaller, and the task of dynamic retrieval of relevant information is easier. Just as we use multi-agent systems over single-agent systems for complex processes, we will increasingly use multi-graph systems over single-graph systems for complex information retrieval processes.

WhyHow.AI is building tools to help developers bring more determinism and control to their RAG pipelines using graph structures. If you’re thinking about, in the process of, or have already incorporated knowledge graphs in RAG for accuracy, memory and determinism, we’d love to chat at team@whyhow.ai, or follow our newsletter at WhyHow.AI. Join our discussions about rules, determinism and knowledge graphs in RAG on our Discord.