Knowledge Table — Multi-Document RAG (Extraction & Memory)

Published in

WhyHow.AI

9 min read2 days ago

Multi-Document Extraction and Retrieval is hard. Easily mapping information in a structured way for retrieval is trickier than it sounds. Building RAG systems for multiple documents is hard, especially if you want to make sure the LLM returns answers with a level of granularity, instead of just an overall summary. We wanted to open-source an internal table-based multi-document extraction and graph creation tool.

In this, we are releasing the following:

Open Source Repo of Knowledge Table
A limited demo of Knowledge Table extended on top of WhyHow’s platform for additional querying and memory functionalities (Regard this as our Product Roadmap for the hosted version of the Knowledge Table)

You should use Knowledge Table if you are interested in efficiently extracting, storing and querying information across a large set of documents, as a business user or a RAG developer. For developers, we have found that inserting a tabular intermediary step for graph construction in your backend RAG system dramatically improves the accuracy of the graphs created. Between Knowledge Table & the WhyHow Platform, we provide:

Multi-Document Accuracy Uplift: 2.5x accuracy over ChatGPT 4o (in web browser) for multi-document retrieval, outperforming Text2Cypher by 2x, and beating GraphRAG
Rule-Based Extraction Guardrails: Granular control of an open-source multi-document extraction process through Extraction Rules & Types
Ontology-Based Query Engine: An intuitive query engine that allows the user to call on both specific tools and columns directly when querying, allowing a seamless combination of both structured and unstructured retrieval

Tables are valuable in a few ways:

For business users, as an easy way to do structured extraction of information across a large collection of documents
For developers, as an intermediary step in a Knowledge Graph RAG system, to parse out a range of rules & ontology-controlled values for Entity Types across documents that can be turned into a graph.

The idea of combining agents in tables is not new and has been around for at least a year, and most of us have certainly experimented with combining LLMs and tables. What is new however, is combining table based extraction for multi-document RAG processes, not just as a front-end experience, but also as part of a backend process to help create structured representations that are easily queryable within a RAG system.

There should be a direct relationship between the words you use in a query and the way your data is organized. The ontology that the table provides also acts as an ontology that is directly interactive in the query interface — see how the ‘average’, ‘dosage’, and ‘diseases’ fields in the query are explicitly called out and invokable by the user.

Extraction Rules

Control over what gets extracted is important. You know certain things about the information in your document and you want to be able to give as much context as possible to improve the extraction process.

The rules we have so far, and will continue to add to, includes:

Must Return:

This is where the answers returned must fit a list of values the user has provided.
For example, for a healthcare use-case, it may be that you already have a list of rare disease names that you want to be able to load in and extract. The list given is the exhaustive list, and results returned should reflect what is in the list, and only in the list.

May Return:

This is where you want to give some examples to help few-shot the LLM as it is performing extraction. The list given is not the exhaustive list, and the LLM may find examples outside the list.

Allowed # of Responses:

This is where you know the number of responses that you expect. This can be for example, the number of id’s that a person is assigned to. Being able to limit the number of responses helps to reduce the potential hallucinations that occur.

As the package is open-sourced, you can help contribute other types of extraction rules to include in the repo, or when running it in your own system, tweak the extraction process that best suits your data and processes.

Chained Extraction

You can choose to chain the extraction process, such that the values produced in the previous columns, are defined and determine the extraction process for subsequent columns. For example, first extracting all the diseases mentioned in a document, and then in a subsequent column, mapping the medication that is needed. To invoke another column in the question field, you simply tag the word to match the referenced column name i.e. @diseases

If you select ‘list of text’ or ‘list of numbers’, you can then split the cell of multiple values into multiple rows of single values, allowing you to do chained extraction far more easily.

Auditability and Source References

In RAG processes, being able to trace the information extracted to where the initial vector chunk of information is table-stakes. For each cell, the chunks from which the answers came from can be viewed. This provides auditability and a quick reference to help trust but verify the LLM output where necessary.

This is also a crucial step in RAG systems, where access to the underlying chunk is required to help construct the answer that is ultimately returned to the user.

Vector Chunks are tied to the answers generated, ensuring full traceability

Document Metadata Generation

Each extracted answer can also be put into memory as a further metadata filter to be used against selecting the documents in the future. For example, imagine one of the questions in a column is ‘Does this document mention my name?’, with the Entity Type ‘Name Mention’. ‘Name Mention’ can now be used as a filter for documents.

In this way, the Knowledge Table process is not only for answering questions, but for generating additional metadata for your documents since you can now link the extracted data to the vector chunk or document. This Metadata helps with filtering and selecting the right documents that would be used to answer a query.

Memory & CSV/Triples Download

You can extract your information both as a CSV or in the form of Graph Triples (with their chunks linked to them). Graph Triples are a bit like structured summarization of facts. This allows you to combine Triples, and query them.

The advantage of exporting in Graph Triples versus CSV is the ability to automatically combine and query your information.

To be able to query the Knowledge Table, we turned the information in the tables into structured triples. This allows us to easily perform natural language queries against the table, and allows us to ask queries that are not just translations of SQL queries, but directly performing vector search of the information on a per-cell basis, and being able to preserve the semantic relationship of that cell with its neighbors.

You may export the CSV and Graph Triples to your own platform for processing, or you may integrate it seamlessly with WhyHow’s Platform. For our Beta users who have access to our platform, check out the documentation that allows you to seamlessly import and query triples from Knowledge Table to the WhyHow platform.

See this link for the documentation that turns the exported triples from Knowledge Table into a graph in the WhyHow platform.

Integration with WhyHow’s memory and query platform — Accuracy Uplift

We have discovered that running parallel extraction against multiple documents in multiple LLM calls increases multi-document extraction accuracy dramatically. When run in a tabular format, and then saved in memory as a graph, the accuracy of the output increased dramatically. To prove this, we ran a benchmark against the following processes to prove that extraction and querying gets better with a Knowledge Table. In this benchmarking exercise, we uploaded ~20 healthcare academic papers, and asked a few extraction-based questions against it:

What types of diseases are being targeted with X- derived therapies?
What proteins are being extracted for X-derived therapies?
Did X company funded the research studies?

We ran the following different processes against RAGAS:

WhyHow Modular Graphs & Knowledge Table
Microsoft GraphRAG
Vector RAG Retrieval
ChatGPT 4o (in browser)
Langchain LLMGraphTransformer with WhyHow’s query engine
Langchain LLMGraphTransformer with Text2Cypher

Overall, we found a dramatic difference in performance. Surprisingly, ChatGPT 4o through the web browser was not able to handle so much context despite having a multi-agent framework embedded in its system. It gave a detailed reply from only some of the documents, indicating that it likely only retrieved chunks from some of the documents. Vector Retrieval similarly struggled with returning precise answers, and tended to give answers in broad strokes.

Langchain’s Graph Creation Package, LLMGraphTransformer, performed better since graph creation steps necessarily focus on entity extraction as an intermediary step. As a result, the relevant diseases and proteins were more prominently featured in the search space for retrieval. However, the graph construction created a large graph that was unfocused and hard to retrieve from. As a result, although we saw an uplift compared to pure vector search, it still fell short.

One thing that was interesting from this data was that we compared the same Langchain graph put into WhyHow and then leveraging WhyHow’s querying engine, as compared to Text2Cypher. WhyHow’s querying engine was able to outperform on the retrieval process despite using the same graph, probably due to the structural issues that natural language queries to structured querying language has faced. Text2SQL has had similar struggles with accuracy (with this post claiming Text2SQL had an only 20% accuracy rate out of the box), so this is not a Cypher only issue.

The reason behind this accuracy uplift was that we were able to parallelize the extraction process through the Knowledge Table. With the LLM more focused on each document specifically, we were able to guarantee a more accurate and exhaustive extraction process.

Ontology-Based Querying

When querying within the WhyHow platform, if you intend to reference a particular Entity Type, the system will help you autocomplete and highlight the terms that map to the system’s internal ontology, as defined by the Entity Types in your Knowledge Table or Graph.

This allows the user to improve the accuracy of their query, maintaining consistency and relevancy during the extraction process. For example, typing an autocompleted ‘disease’ in the query, that maps to a Entity Type/Column called ‘Disease’ will instruct the LLM. “@” can be used to call upon a specific Entity Type/Column, while “#” can be used to call upon a specific tool that helps with the creation of the answer, such as calculators and other tools. We have on our roadmap a number of structured comparison and matching tools that we intend to include over time — think “COMPARE” or “CALCULATE RELATIONSHIP”.

Specific questions that can be answered through this unique combination that could not be easily answered with systems like Text2SQL include:

“What is the document that contains the @diseases that rhymes with Bout?”
“What is the #average @dosage for the @disease that rhymes with Petformen?”

Querying and memory is not part of the open-source package, and the video above is an example of how it can fit into RAG workflows in combination with WhyHow’s Memory and Retrieval platform.

WhyHow.AI’s Knowledge Graph Studio Platform (currently in Beta) is the easiest way to build modular, agentic Knowledge Graphs, combining workflows from LLMs, developers and non-technical domain experts.

If you’re thinking about, in the process of, or have already incorporated knowledge graphs in RAG for accuracy, memory and determinism, we’d love to chat at team@whyhow.ai, or follow our newsletter at WhyHow.AI. Join our discussions about rules, determinism and knowledge graphs in RAG on our Discord.