WhyHow’s KG Studio Platform Beta for RAG Native Graphs

Chia Jeng Yang
WhyHow.AI
Published in
10 min readJul 24, 2024

We are excited to announce the Beta launch of the WhyHow.AI Knowledge Graph RAG Platform, the easiest and most intuitive way to build RAG-Native Graphs. We are focused on building tooling to enable developers and non-technical domain experts to build RAG-Native Graphs, Agentic Memory and other forms of structured knowledge representations.

Our KG RAG Platform is an API-first experience that is complemented with a SaaS interface for multi-player graph creation that allows technical developers to work with non-technical (or slightly technical) domain experts. All the features you see in the UI/UX are available in the SDK, which will be released in the coming weeks. Our documentation can be found here.

Here are some of the key problems we are solving for in this early version of the WhyHow.AI Platform:

  • Personalized, Human-in-the-Loop Rule-Based Entity Extraction and Resolution
  • Multiplayer Graph Creation
  • Flexible Data Ingestion & Framework integration
  • Cost of Graph Creation
  • Native, Modular Small Graph Creation
  • Granular Schema Manipulation / Creation
  • Vector Chunks as First Class Citizens in Graphs
  • Database Agnostic Graph Creation
  • Modular Graph Querying Manipulation

Personalized, Human-in-the-Loop Rule-Based Entity Extraction and Resolution

Our workflow tool works on top of the best entity extraction, resolution models and partners out there. However, entity resolution is not just industry-specific but may also sometimes be company or use-case specific.

As part of our system, we introduce a rules-based and active learning system to perform entity extraction and resolution. The node labeling and merges that you perform will be learnt by the system over time, automating entity extraction resolution in a personalized way. The key distinction in our workflows is that we believe entity extraction and resolution are use-case specific and highly personalized. We believe in some scenarios, it makes sense for certain entities to be resolved together in line with one specific schema and use-case. In other scenarios, those same entities should be distinct based on another specific schema and use-case.

While semantic similarity is helpful, using semantic similarity solely to resolve entities and nodes faces the same issues with Vector RAG, which is that semantic similarity is insufficient to identify differences in clearly important terms. Take for example a retail use-case, where product name ‘XYZ-5623’ and ‘XYY-4623’ may be regarded as semantically similar, and should not be resolved into similar entities. We want to ensure that users of our platform can write rules to maintain a clear distinction of terms, instead of simply relying on semantic similarity for entity resolution.

Making sure that the entities are extracted and resolved at the specific level of granularity you want for your use-case is crucial, and we want to be able to easily allow you to inject your opinion and context, with a system that will automatically learn about your preferences and with transparent rules that can be ‘unlearnt’ if the use-case changes.

In the example below, once you have merged the entities ‘Meta (Facebook)’ and ‘Meta’, the system will remember this action and automatically perform such entity resolution steps across the board.

We believe our workflows will work on top of a range of internal and third party entity resolution tools and models, and is not necessarily meant to be the be-all approach to entity resolution.

Multiplayer Graph Creation

RAG systems have traditionally been led by a developer or data scientist, many times working in conjunction with non-technical or slightly technical domain experts who are used to help give context about the underlying information that is being represented and retrieved.

When it comes to knowledge graphs, where the need for precise knowledge representation is higher, the role of the domain expert is much more involved. KG RAG processes therefore demand a multiplayer graph creation workflow experience where non-technical domain experts can view and interact with graphs that others have begun to build.

To start with in this early version of the Platform, we are introducing the Graph Sharing feature, which allows you to share links of the graph that is built for exploration purposes. The UI/UX which provides for no-code Entity Extraction and Resolution is also designed to be straightforward enough for a non-technical user to use.

Now, you can easily share and collaborate with other users on the graph creation and curation process.

Flexible Data Ingestion & Framework integration

We believe that:

  1. There are multiple types of file formats we need to accommodate, and there are multiple types of data preprocessing pipelines that people are building already. This means that while context-preserving data extraction is still a difficult problem, many developers are comfortable building their own data extraction pipelines that WhyHow wants to accommodate. In line with this, our RAG-Native KG Platform currently accommodates PDF, CSV, JSON & TXT file formats, with an SDK coming soon to upload your own pre-processed data through APIs.We will be continuing to accept more file formats and data pipelines on demand.
  2. With the rise of a range of open source libraries for graph creation, some teams are interested in using their own processes for triple creation and are interested in using WhyHow’s platform for graph orchestration and management. Our platform is intended to be framework agnostic, and to accommodate all types of triple/ graph creation processes through our endpoints. As we release documentation for our SDK over the next few weeks, we are looking to build easy clean integrations into our platform.

Granular Schema Manipulation / Creation

Granular control of the schema is crucial to allow people to create the graphs that reflect their specific view. We believe that schemas for KG RAG are very specific to a use-case and domain, as opposed to generic entities and relationships like Person, Organization, Places.

Our schema also allows for the manipulation and creation of Properties within Entity Nodes, and can be constructed in a non-technical friendly way through the UI/UX, as well as through a JSON file upload.

An example of the schema as expressed and editable in JSON

As part of the small, modular graph approach, the KG RAG process may produce multiple schemas that may be used and reused across one or more specific documents, and so remains a primitive that can be manipulated and tied at will.

Each small graph can therefore be created through any mixture of documents and schema used.

Our workspace function allows you to switch between different workspaces, which is also ideal for consultants or teams that are working with a range of underlying clients. They allow you to create separate environments for different teams, projects, or domains and control access and permissions for each workspace.

Cost of Graph Creation

We are proud to have one of the lowest costs at the level of accuracy of graph creation that our design partners in regulated industries need.

This benefit comes from the use of granular schemas in a way that only WhyHow provides. Through granular schema manipulation, you can scope in on the entities and relationships you care about, as opposed to arbitrarily collecting every piece of information, relationship and entity that could be in the underlying text. This level of granular focus enables you to save costs accordingly.

Native Modular Small Graph Creation

At WhyHow.AI, we believe that the future of RAG systems, and multi-agent systems, will demand the creation and orchestration of many small graphs that are scoped in. We talk more about small graphs here, here, and here.

In line with the small graph philosophy, we built workflows and infrastructure that natively support small graph creation. Each small graph is a self-contained manipulatable graph that can be called upon agentically, and has its own self-contained logic, rule-sets, data-pipelines, and many more.

We make it easy to spin up small graphs, allowing it to be easily explored, debugged, and called upon within a RAG process.

What a range of small graphs look like, each individually manipulatable

Database Agnostic Graph Creation

As workflow tools, we want to meet developers where they are at, and be able to support the ecosystem of graph and vector databases out there.

We believe graph structures are a great complement on top of your existing primary data stores, and so must respect your existing data and where they currently lie. As interesting as I find graphs to be, graph computations are a subset of all the computations people want to do on their data. At WhyHow.AI, we want to make it easy to offer, create, and query performant graph structures over your data, wherever they sit. We are also building native connectors with vector databases to allow users to enable Graph RAG capabilities from pre-existing vector chunks through an API, while still using their existing Vector databases as the storage layer / source of truth for the vector chunks. Vector databases that are interested in building with us should reach out!

If you decide you don’t want your data within WhyHow and want to export it to your graph database of your choice after you have built graphs with our Graph Studio / Creation tool, you have the power of interoperability. With features like “Export in Cypher”, any Cypher-compatible graph database would be able to take in the graphs you created with us. Graph databases that are interested in building native connectors with WhyHow.AI can also reach out to us!

We will be building additional features to allow you to natively import your vector chunks and other types of unstructured and structured data from external databases and data lakes, allowing you to build graph structures on top of your existing vector databases, without having to move your vector chunks anywhere.

Vector Chunks as First Class Citizens in Graphs

Vector chunks are the fundamental atomic unit for unstructured information retrieval.

Even when we perform information retrieval from a Knowledge Graph, retrieving just the triples misses out on a lot of contextual information, and underutilizes the power of LLMs to construct answers from raw text. As we have noted, Knowledge Graphs should be seen as a means for structured retrieval of vector chunks.

As such, we believe that RAG-native graphs need to respect the way that vector chunks are manipulated, visualized and interacted with. With our platform, we enable it to be easy to interact with vector chunks, to create auditability in the way that vector chunks interact with graph structures, to see vector chunks as elements for graph retrieval, and to allow for entity nodes to be created directly from selecting and highlighting texts within vector chunks within the Chunk Panel on the bottom right of the graph visualization page.

We also provide a Chunk Dashboard to better navigate, visualize, add and remove additional chunks to specific graphs.

When uploading a document, the raw text is split into chunks for consumption into the graph creation process and inclusion into the display and optionally into the querying process. However, your representation might not be completely represented in the documents you upload. For example, you may have financial documents that cover some of the domain, but you may have some other company information or context that you would like to include in addition.

Further, you may have only some specific sections of a document that you want to include and remove extraneous context, or you may have a separate text extraction and cleaning process that you prefer or have tailored to your information. In this case, you can just add that processed and chunked text in directly, and build a graph from that.

Modular Graph Querying

In our platform, we focus on exposing the process and logic for which nodes and entities are extracted from the graph.

The process of turning a question into the right entities and nodes to be extracted can be conceptually simple or extremely complicated, depending on the nature and purpose of the graph created. In a straight-forward implementation of KG & RAG, we identify the relevant entities and relationships that the question mentions and retrieve accordingly. In a more complex implementation which may require multi-hop retrieval, some level of reasoning about the relevant nodes and relationships may be required. Agentic systems may also be required if the graph represents specific types of data, including SOPs or reasoning traversal pathways.

As such, although we are building and providing a graph query engine, we are also in the process of building systems to expose the endpoints for folks to experiment and implement their own graph query logic.

WhyHow.AI is building tools like this Knowledge Graph Studio Platform to help developers and non-technical domain experts build Agentic & RAG-Native knowledge graphs.

If you are interested in helping us shake out bugs in the Platform Beta, find some time to chat with us and get an additional limited Access Codes through our Calendly. To follow the conversation, join our Discord Group here.

For our Medium subscribers — One-Time Use Access Codes (If they are not working, they have been claimed already):

7aac73e7–29dd-461a-87fb-cc352ddc056a
c549c41c-1d45–40cb-9b4d-c556c9f1e680
c56d390b-7eab-45e2-aa62-fc03f2dfe1e0
013a84bc-42db-49a8–8de7–27fe6d10b37c
24c1662b-eb69–402f-aa6c-77cdaa5fb39d

--

--