Maximising business value from GenAI using LLM’s and Knowledge Graphs

Aptitude Global
The Aptitude Data Blog
7 min readMar 11, 2024

GenAI is no longer hype, but is it effective?

Generative Artificial Intelligence (GenAI) based on Large Language Models (LLMs) is moving swiftly from hype to tangible reality in maximising business value. By creating efficiencies and stimulating revenue growth, GenAI has assertively secured a place on executive agendas.

However, integrating LLM’s into business processes is never easy and fraught with challenges. Model safety & ethical concerns top this list, and a noteworthy challenge is that of model accuracy. Model inaccuracies surface through hallucinations, and these accuracy gaps can lead to a rapid loss of business trust and adoption while also potentially causing unforeseeable outcomes. Be it the non-existent historical legal decisions brief generated by GenAI in the Avianca airlines Lawsuit case (here) or the false claims of serving prison term made by GenAI in the Australian mayor case (here), it is quite evident that hallucinations can cause major erosion to business value.

So how do you build trust in GenAI and improve the accuracy of LLMs? The answer to this accuracy challenge lies in using Reliable Data and Leveraging Smart Technologies like Knowledge Graphs.

Data — Lifeline of GenAI

From experience 70% — 80% of AI Projects are failing at the Data Preparation stage of the AI lifecycle including GenAI. Industry research from experts like Mckinsey, Gartner, Forbes and Deloitte well evidence this fact.

As seen in the Figure 1, Data Preparation is a foundational and crucial step in the overall AI lifecycle and any gaps has the potential to impact every subsequent step including the eventual business outcome.

Figure 1 — The AI lifcycle

In the context of GenAI, Large Language Models (LLM) enable business use cases by responding to natural language queries from users and providing relevant responses. The accuracy of such responses is crucial in building confidence in the model.

As indicated above, the data preparation stage plays a crucial role in enabling the success of such LLM’s. It can enhance the success of LLM selection, prompt tuning & design, model training and fine tuning.

A key data oriented solution to enhance the accuracy of LLM’s is Retrieval Augmented Generation (RAG). RAG enables dynamic provision of contextually relevant, factual information from trusted data sources to the model during the generation process. This enables the reduction in computational cost and time for model training given the smaller, more relevant training datasets. Model accuracy is then continuously enhanced with real time updates of this data.

In this RAG approach, LLM accuracy is enhanced by providing it with a deeper understanding of the business context and other impacting factors such as language patterns. One way to implement this approach is to build an underlying structured database that continuously consolidates trustworthy data relevant to the business context, across diverse formats such as structured (e.g. relational databases) and unstructured data such as Emails, PDF’s, social media posts etc. The LLM can then respond by querying such an underlying structured database. This approach will be covered in more detail later.

In summary the accuracy of responses from the LLM thus depends on following key factors -

  • The quality, quantity and size of training data being fed into the LLM
  • Enabling a business contextual view of this data to the LLM
  • The LLM architecture e.g. a pre-trained model can be more accurate considering its existing contextual knowledge on the business domain
  • Effective prompt tuning. Combining user questions and contextual data from underlying data stores to simplify and improve prompt success

A business contextual view refers to the metadata, mappings, transformations, and ontologies that provides the LLM with business semantics and knowledge about the enterprise as well as the specific business use cases in consideration

Knowledge Graphs —Filling the LLM knowlege Gap

Knowledge Graphs (KGs) are an excellent technology solution that can reduce LLM hallucinations and thus enhance LLM accuracy, as we will see next.

As IBM defines it, knowledge graphs, also known as a semantic network, represent a network of real-world entities — i.e. objects, events, situations, or concepts — and illustrate the relationship between them. They add the missing business context to the underlying technical data. This information is usually stored in a graph database and visualised as a graph structure, prompting the term knowledge “graph.” (here).

Figure 2 shows a sample knowledge graph from the Sports Industry. As can be seen from the diagram all the technical data related to people, events, locations, offices etc. has been represented in context along with the relationships between them.

For instance people have been classified as fans and players, offices as Clubs, locations as stadiums and so on. And the relationships between them have also been well depicted such as “Player registered to a club”, “Stadium has a scheduled event” and so on.

This approach now helps query such a knowledge graph in semantically rich language rather than a technical query language such as SQL that would then require custom code to transform to real life context e.g. “Which fans will be interested in the next event in a particular stadium?”, “How club members engage with the Club?” and so on.

This support for semantic queries can significantly simplify building of Natural Language Processing (NLP) based applications for information exchange while also significantly enhancing user experience.

Figure 2 — A Knowledge Graph showing Fan interactions with a club on Social Media

Let us now dive deeper into how such knowledge graphs can help enhance the LLM accuracy by adding business context.

One common use case for LLMs is to query data in a database in natural language — converting the user query to SQL to generate a response. Figure 3 depicts key steps involved in this process. The biggest challenge here is the LLM has no context (other than the database schema) of the meaning of the tables and columns in the dataset, resulting in very poor accuracy. In this fundamental setup as explained before, the business context needs to be added and if it is missing the responses from the LLM can be inaccurate leading to hallucinations.

Figure 3 — Querying data in a relational database using an LLM

Figure 4 below now depicts the introduction of knowledge graphs into this mix.

Figure 4 — Knowledge Graph enabled LLMs

As seen from the diagram following key additional steps now enable improved accuracy of the LLM by leveraging knowledge graphs-

  • In parallel to consolidating diverse data sources into a structured database in step 2, the data from these sources is also used to build a knowledge graph as shown in step 3, to decipher and hold the business semantics of this technical data.
  • Once the knowledge graph is created, a logical next step is to build mappings between the structured database and the knowledge graph as shown in step 4. These mappings act as key to connect the technical data to associated business context
  • The LLM now converts the natural language query to a knowledge graph query instead of a structured query as shown in step 6. This is the most crucial step towards accuracy improvement. The knowledge graph query generated can be a significantly more accurate representation of the natural language query considering its semantic nature as explained before.
  • The knowledge graph then maps this query from LLM to corresponding structured query on the structured database as shown in step 8. Again this structured query is significantly more accurate as it leverages the mappings already generated in step 4

Using knowledge graphs, this augmented process enables the LLM to provide a significantly more accurate response to the natural language query posted to it.

Industry Research

Data World (link here) has conducted a benchmarking exercise on LLM accuracy improvement with knowledge graphs and the results are impressive! This exercise was conducted on specific use cases from the insurance industry and implementing SQL structured database.

The exercise recorded an average 3x improvement in LLM response accuracy and marked improvement across a combination of complex SQL schema and SQL query scenarios. This was particularly acute for high-schema complexity questions that without the use of knowledge graphs yielded a 0% accuracy score.

Gartner has also conducted research on impact of emerging technologies in GenAI model performance (link here), highlighting knowledge graphs as a significant enabler for LLM accuracy improvements. In fact they position knowledge graphs right at the centre of the impact radar with a timeline of NOW.

Looking Ahead: The Future of Knowledge Graphs

It is quite evident from my experience that Knowledge Graphs represent the new normal for enhancing GenAI solutions.

Multiple well established technology products exist in the market to support the creation of knowledge graphs such as Neo4j, Stardog, AnzoGraph, GraphDB and many more.

While knowledge graphs smooth the adoption of GenAI solutions, they do present a challenge to maintain. Keeping knowledge graphs in sync with the continuous changes in the underlying data can be very complex and will consume effort. This challenge can be addressed through automation in both the build and integration phases. Automating the build and updating of knowledge graphs is not easy and advancements in this space will mark the next transformational step in the journey to intelligent, accurate knowledge graph enabled GenAI solutions.

Watch out for more articles on this space in the near future!

Author: Balaji Kumar Venkatramani

--

--

Aptitude Global
The Aptitude Data Blog

Aptitude are specialists in Data & Analytics, Data Science and Machine Learning solutions