One Click Knowledge Graph Creation Creates Bad Graphs
Recently, there has been a lot of hype on the promise of Knowledge Graphs in AI workflows. A large part of this is the growing realization that we want to structure context and knowledge in such a way that makes it easy to pull in relevant information, to combine both structured and unstructured search together. Vector databases and vector RAG is the most popular solution for building custom AI solutions with your own data, but vector RAG alone is insufficient for complex workflows and use cases that have a high bar for accuracy. So there is high demand for an easy way to ‘connect the relevant dots.’
Knowledge Graphs, conceptually, are the definition of ‘connect the relevant dots’, even if there is no standard or broadly accepted definition of what a knowledge graph is.
Regardless, for those who have been working in search and data for a long time, it is clear that Knowledge Graphs are relevant, and some of the most successful search companies have been using structured data in some way, shape or form.
As part of this sentiment, there have been a few attempts to build one-click graph creation models and platforms. After much experimentation and experience building graph solutions in the real world, we do not believe the graphs created through these systems are robust enough to be used for anything outside of toy examples, and disappointing results from these platforms are more than likely to detract from the real value of using graph data structures in your workflows.
The main reason for this is that everyone’s use case and data requirements are different. The vast majority of enterprise use-cases are sufficiently complex that they require a custom approach, meaning the supporting data structures, retrieval strategies, and data processing strategies must be tailored to their use case.
Given that Knowledge Graphs are designed to deliver relevancy in ways that vector RAG alone were incapable. This requires lots of expert input, entity resolution, data modeling and validation, etc. The idea that one-click graphs could create 100% accurate general graphs is, in our opinion, conceptually impossible. We will get 100% accurate one-click Vector RAG solutions before we get 100% accurate one click Knowledge Graph RAG solutions.
Coincidentally and unrelatedly, this is a sentiment similarly shared recently by the founder of Neo4J, about attempts to automatically create a graph with an LLM.
A focus on Services for Graph Creation / Architecture Design
As part of this observation, we will be deprecating our automated triple creation feature (i.e. the create graph from schema button). This feature had always required and focused on providing workflows for opinionated schema generation, with its own schema co-pilot tools. While the schema generation tools will continue to exist, we wanted to specifically tackle expectations of perfectly created graphs from poorly developed schemas.
We have always believed in human-in-the-loop graph creation processes and have broken up the individual steps involved with the graph creation step and focused on workflow features to allow a human in the loop process. However, there was still an unintended expectation from users that graph creation and schema design would be at best a few minutes process. In contrast, see this case study about how we built an end to end Temporal Knowledge Graph system for a healthcare client, where 80% of the 25 hours spent was on schema design.
We believe a lot of these one-click graph solutions are fundamentally vaporware, and unfortunately deliver AI hype over actual business outcome, and want to make a clear delineation.
WhyHow will remain the platform to help with modular graph orchestration, manipulation, and retrieval. For graph creation, we will continue our services helping startups and enterprises with ontology design, E2E multi-agent knowledge graph design and implementation.
We have released a number of case studies showing the work we have done with clients and design partners, showcase our process for exploratory data analysis, and the range of data structures built. You can see examples of these case studies here:
- Healthcare: https://medium.com/enterprise-rag/case-study-turning-doctor-transcripts-into-temporal-medical-record-knowledge-graphs-cf624d4927eb
- Finance: https://medium.com/enterprise-rag/knowledge-graphs-completeness-multi-document-retrieval-benchmark-6304905a0a6c
- Legal: https://medium.com/enterprise-rag/legal-document-rag-multi-graph-multi-agent-recursive-retrieval-through-legal-clauses-c90e073e0052
You can also still rely on existing triple creation packages and processes, including things like Langchain’s LLMGraphTransformer and load those triples into the WhyHow platform for manipulation, and we have a range of different notebooks that showcase different triple creation techniques being plugged into the WhyHow platform.