WhyHow.AI KG SDK Upgrade: Linking Vector Chunks to Graph Nodes — Explainability & Accuracy

Published in

WhyHow.AI

6 min readMay 4, 2024

Tired of just returning single-word triples from your knowledge graph? WhyHow.AI’s latest upgrade with vector chunk linking now lets you use a graph structure to determine which raw vector chunks to return to the context window, combining the best of knowledge graphs and vector search. This provides the ability to do graph-based information navigation, and chunk-based context-injection, returning the raw chunks instead of just triples.

We build workflow tools for data orchestration, and graph creation, and we work on top of any data extraction model you want to bring. In this case, we work on top of OpenAI, Neo4J, and Pinecone, and will be supporting the most popular data extraction models, LLMs, graph and vector databases.

The Challenge of Balancing Detail with Relevance

When building knowledge graphs solely based on triples (i.e. Entity->Relation->Entity), there’s a risk of omitting valuable detailed data necessary for comprehensive answers. This comes from the difficulty of using graphs to perfectly represent all the complexity of unstructured prose. Conversely, relying only on raw data chunks retrieved through semantic similarity can introduce erroneous, relevant information into the context window, potentially leading to errors in your RAG system. The challenge lies in capturing sufficiently relevant details without overwhelming the context window.

As part of our automated knowledge graph creation SDK, users can now link relevant raw data chunks to a graph structure both automatically and manually. This bridges the gap between raw data and conceptual representations

“While the triples in a Knowledge Graph are useful in providing specific information that semantic similarity was unable to retrieve, we wanted to also allow leeway in the information represented and retrieved from the graph, to include the surrounding words and retrieving the relevant raw vector chunk tied to that graph node as well. By tying vector chunks to a knowledge graph, we get the advantages that lie in both vector and graph search.”- WhyHow.AI Design Partner

Simply put, tying knowledge graph nodes to vector chunks means that there is a lower burden to use the graph itself to represent all the underlying knowledge in the knowledge base.

This is in line with WhyHow.AI’s philosophy of small graphs or focusing on the minimum viable graph for your needs instead of creating large graphs for an unclear ROI. With chunk linking, you can use a higher-level representation of your data that represents higher level themes that the questions asked against the knowledge base touches upon, and then return the broader vector chunks to populate the context window. Manual chunk linking can also be used to make some chunks explicitly related to certain themes in ways that the LLM is not able to recognize, allowing developers to easily inject their context of certain relationships into the system.

Querying the Enhanced Knowledge Graph

Consider a medical example. Let’s say we are building a knowledge graph of symptoms, causes, and treatments of headaches for a medical professional. To build this graph, we upload patient case reports and medical textbooks to extract relevant entities and relationships and construct a graph. With chunk-linking, in addition to extracting concepts, we’re also keeping track of relevant chunks from which entities were extracted. This way, we can link the raw text to the concept. Here’s what’s happening under the hood:

Data Upload and Processing: When a user uploads a document, it is converted into text, split, and cleaned.
Vector Indexing and Entity Extraction: The text is indexed, and relevant entities and relationships are identified according to predefined user criteria.
Graph Construction: Entities and relationships are converted into triples, and triples are combined to build a graph.
Chunk Linking: Chunks of raw test are automatically linked to the most relevant graph nodes, which are derived from the context of these chunks.

When exploring the causes of headaches in a 45-year-old male, the knowledge graph may identify several possible causes: primary disorders like migraines, secondary causes like hypertension, or various other factors including medication misuse or sleep disorders. If the query specifically asks about the potential causes of a chronic morning headache in this patient demographic, a knowledge graph with chunk linking would not only return a set of relevant triples, but also a set of associated chunks that provide rich descriptions.

whyhow_client.query_graph(
 query = "What are the causes of a chronic morning headache in a 45-year-old male?",
 namespace = "headaches",
 include_chunks = True
)

In response, we get a list of relevant triples as well as associated chunks which users can pass to their LLM to generate an explainable, well-constructed response:

{
    "namespace": "headaches"
    "triples": [
        {
            "subject": "chronic morning headache",
            "predicate": "caused by",
            "object": "primary headache disorders"
        },
        {
            "subject": "chronic morning headache",
            "predicate": "caused by",
            "object": "secondary headache disorders"
        },
        {
            "subject": "primary headache disorders",
            "predicate": "include",
            "object": "tension-type headache"
        },
        {
            "subject": "primary headache disorders",
            "predicate": "include",
            "object": "migraine"
        },
        {
            "subject": "secondary headache disorders",
            "predicate": "include",
            "object": "sinusitis"
        },
        {
            "subject": "secondary headache disorders",
            "predicate": "include",
            "object": "hypertension"
        },
        ...
    ],
    "chunks": [
        {
            "text": "The differential diagnosis of a chronic morning headache in a 45-year-old male patient includes primary headache disorders (e.g., tension-type headache, migraine), secondary headache disorders (e.g., sinusitis, hypertension), and other conditions such as medication overuse and sleep disorders.",
            "source": "headache_disorders_classification_diagnosis.pdf"
        },
        {
            "text": "Primary headache disorders, such as tension-type headache and migraine, are common causes of chronic headaches. Tension-type headaches are often described as a tight, band-like pressure around the head, while migraine headaches are typically unilateral, pulsating, and may be accompanied by nausea, vomiting, and sensitivity to light and sound.",
            "source": "headache_disorders_classification_diagnosis.pdf"
        },
        ...
        {
            "text": "A 45-year-old male patient presented with a history of daily morning headaches for the past 3 months. He reported using over-the-counter ibuprofen almost daily to manage his headaches.",
            "source": "headache_case_report.pdf"
        },
        {
            "text": "Physical examination was unremarkable, and neuroimaging studies were normal. The patient was diagnosed with medication overuse headache and advised to gradually withdraw from the overused medication.",
            "source": "headache_case_report.pdf"
        },
        ...
    ]
}

Using this response, we can pass the response into an LLM to generate a complete, well-formulated response using the grounded truth from the triples and the evidence provided in the chunks:

The potential causes of a chronic morning headache in a 45-year-old male patient include:

Primary headache disorders:
   Tension-type headache: Tension-type headaches are often described as a tight,    
   band-like pressure around the head and may be associated with tenderness of the 
   pericranial muscles. Stress, poor posture, and jaw clenching are common triggers.
  
   Migraine: Migraine headaches are typically unilateral, pulsating, and of moderate 
   to severe intensity, often accompanied by nausea, vomiting, and sensitivity to 
   light and sound. Genetic factors, hormonal changes, and environmental triggers 
   play a role in the pathogenesis of migraine.

Secondary headache disorders:
   Sinusitis: Sinusitis-related headaches are often worse in the morning due to the 
   accumulation of mucus overnight. The headache is usually localized to the 
   affected sinus area and may be accompanied by nasal discharge, facial pain, and 
   fever. Treatment involves managing the underlying sinus infection with 
   antibiotics, nasal decongestants, and anti-inflammatory medications.

   Hypertension: Hypertension-related headaches are often described as a dull, 
   bilateral pain that is worse upon awakening and improves throughout the day. 
   Effective management of hypertension with lifestyle modifications and 
   antihypertensive medications can help alleviate these headaches.

Other conditions:
   Medication overuse: Medication overuse headaches can occur when over-the-counter 
   pain relievers are used too frequently. A case report of a 45-year-old male 
   patient with medication overuse headache revealed that he had been using 
   ibuprofen almost daily for 3 months. The patient was advised to gradually 
   withdraw from the overused medication and was provided with education on proper 
   headache management and preventive strategies.

   Sleep disorders: Sleep disorders, such as obstructive sleep apnea (OSA), can 
   cause morning headaches due to the effects of hypoxia and sleep fragmentation on 
   the brain. OSA is characterized by repeated episodes of upper airway obstruction 
   during sleep, leading to daytime sleepiness, snoring, and morning headaches.

Benchmarking this against a simple Graph query that only contained triples, and did not return the raw vector chunks:

The potential causes of a chronic morning headache 
in a 45-year-old male patient include:

1. Primary headache disorders:
- Tension-type headache
- Migraine

2. Secondary headache disorders:
- Sinusitis
- Hypertension

3. Other conditions:
- Medication overuse
- Sleep disorders

In our experience, liking raw text chunks to graph nodes leads to more complete, explainable, evidence-based answers. This is useful in scenarios there questions are more open-ended such as with questions about legal documents, instruction manuals, fictional texts, academic papers, etc.

WhyHow.AI’s Knowledge Graph creation SDK and deterministic document structures-based chunk extraction is currently in private beta. If you are thinking about, in the process of, or have already incorporated knowledge graphs in RAG, we’d love to chat at team@whyhow.ai, or follow our newsletter at WhyHow.AI. Join our discussions about rules, determinism and knowledge graphs in RAG on our newly-created Discord.

WhyHow.AI KG SDK Upgrade: Linking Vector Chunks to Graph Nodes — Explainability & Accuracy

The Challenge of Balancing Detail with Relevance

Querying the Enhanced Knowledge Graph

Written by Chia Jeng Yang