Introducing WhyHow.AI Open-Source Knowledge Graph Schema Library — Start Experimenting Faster

Chia Jeng Yang
Published in
4 min readMay 31, 2024

We are excited to announce WhyHow.AI’s Open Source Knowledge Graph schema library. Within this schema library, you can discover a range of schemas that were created for a range of different use-cases in different domains, ranging from Finance to Healthcare, Manufacturing, Meeting Transcripts, and many others.

With this schema library, you can now get started experimenting a lot quicker. We see most of the value here to be in brainstorming and ideation of the types of schemas that could be relevant for your exact use-case and industry.

The vast majority of the schemas that we or our design partners have built are not available here, but if you are a current user of our SDK and are willing to contribute to this open-source library, please do contribute to the repo.

It is exciting to see more and more people realize that knowledge graphs are going to be an increasingly important component of GenAI solutions. As part of the graph creation process, it is increasingly clear that more granular control and workflow tooling for schema creation is crucial for creating the knowledge graph that works for your use-case, and for your data. This is as opposed to using a general prompt to capture ‘everything in your knowledge base’.

Some of the specific questions that emerge that help our design partners realize that opinionated workflow tools are needed include:

  • What relationships should I capture?
  • Do I really want to capture all the entities or are there specific things I care about?
  • Do I want to be very specific or remain exploratory about the graph that is created?
  • Is the graph a property graph, or a policy graph, or a knowledge graph?
  • Etc…

WhyHow.AI’s schema process

At WhyHow.AI, we focus on a range of tools to help with the different workflows that a developer may encounter. We currently support questions, schema, & CSV-defined graph creation.

With this, you now have a range of exploratory vs more deterministic tools for graph creation.

With our questions-defined graph creation feature, the focus is on using questions as a means for controlling the nature of the data extracted to create the graph.

With our schema-defined graph creation feature, the focus is on describing in natural language the type of schema that would restrict and control the graph created, ensuring the graph adheres to a strict set of entities,relations and patterns defined by the user.

With our csv-defined graph creation feature, the focus is on replicating specific structured data in a graph format.

Across this spectrum, one can imagine questions-defined graph creation as more relevant when the developer may not necessarily be an expert of all the underlying data, but has an idea of the type of questions they might want to ask from the knowledge base.

Now, we aren’t just throwing a schema at an LLM and telling it to build us a graph. While that approach is a relevant (and admittedly fun) way to explore your data through LLM-reasoned patterns, it lacks the precise control and domain specificity needed to reliably perform entity and relationship extraction and build graphs in a repeatable way. Entity resolution and coreference resolution also remain difficult, especially when building with a corpus of larger documents. Our schema-based extraction solution takes a multi-agent approach and employs state of the art ML models throughout multiple stages (such as entity definition and extraction, relevancy checks, relationship detection, pattern alignment, etc.) to build more reliable and comprehensive extraction. Our modular approach also enables us to inject meaningful context and swap in domain-specific models as needed to enable more customized, reliable extraction that maps to your unique use case and view of the world.

Schema Development Process

As part of the schema-defined graph creation feature, our Schema Library lets you get started experimenting faster. In general, we see the following process when design partners come to us specifically for schema generation help:

  • Read our Medium articles, case studies, and now Schema Library and come with an understanding of the specific use-case and data set you want graphs for
  • Use the Question-Defined Graph Creation feature to generate a list of potential schemas that would be a fit
  • Use an LLM to generate the schema based on a few high-level schemas and objectives
  • Edit the final output json schema with our schema-defined graph creation feature and create the graph immediately in Whyhow.AI’s SDK.

These steps are not necessary and relevant in all scenarios, and depend very much on the use-case, specific data, how familiar and specific you want to be with your graph. At the end of the day, granular tooling for experimenting, iteration and management is what is needed for incorporating more deterministic graph structures, and it will be exciting to see more systems incorporate knowledge graphs.

If you are interested in joining our closed Beta or are interested to chat about schema generation, please find some time with us on our Calendly at WhyHow.AI or email us at

WhyHow.AI is building tools to help developers bring more determinism and control to their RAG pipelines using graph structures. Join our discussions about rules, determinism and knowledge graphs in RAG on our Discord.