Build Knowledge Graph From TextData using LangChain | Under 2min

Mahimai Raja J
5 min readFeb 4, 2024

--

Brain stores knowledge in the form of information graph.

Hey folks! In this blog, I will take you through the Knowledge Graph and how to build one from your own text data.

What is Knowledge Graph?

Knowledge Graph, also known as a semantic graph, is a intelligent structure to store the data in a efficient manner. The data is stored in the form of nodes and edges. As depicted the figure 1, the nodes represents the objects and the edges denotes the relationship between them. The data model represented by knowledge graph is sometimes called as Resource Descriptive Framework (RDF). RDF defines the way of sites interlinked in the World Wide Web.

Fig 1. Sample Graph Data

Why Knowledge Graph?

In the entire story of data only few datapoints are intrinsic to represents the whole dataset. Thus Knowledge Graphs stores only the important datapoints. This, significantly reduces the retrieval time complexity and reduces the space complexity.

Some of my favourite use cases with Knowledge Graph are Drug Discovery and RAG based Virtual Assistant Chatbots.

IMPLMENTATION

1. Install and Import Packages

(NOTE: We’ll use Open AI’s GPT-3.5 to generate entities and relationships, make sure you are ready with your Open AI Api Key)

Install the packages using your favorite package manager. Here, I am using PIP to install and manage the dependencies.

pip install -q langchain openai pyvis gradio==3.39.0

Import the installed packages.

from langchain.prompts import PromptTemplate
from langchain.llms.openai import OpenAI
from langchain.chains import LLMChain
from langchain.graphs.networkx_graph import KG_TRIPLE_DELIMITER
from pprint import pprint
from pyvis.network import Network
import networkx as nx
import gradio as gr

2. Setup API Keys

Using the copied API Key from the Open AI Platform Dashboard setup the api key environmental variables. Here, I am passing the variable, through the colab secrets. So, before running the cell, make sure, you have assigned the secret variable with the api key value.

from google.colab import userdata
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

3. Define the Prompt

It is more crucial to ask the correct question to the LLMs, so that they can generate, what we need. Here, were are adding few examples along with the instructions, so that we can reduce the hallucination during the inference. This way of prompting is known as Few-Shot prompting. Feel free to read the prompt to get a clear idea on how it works.

# Prompt template for knowledge triple extraction
_DEFAULT_KNOWLEDGE_TRIPLE_EXTRACTION_TEMPLATE = (
"You are a networked intelligence helping a human track knowledge triples"
" about all relevant people, things, concepts, etc. and integrating"
" them with your knowledge stored within your weights"
" as well as that stored in a knowledge graph."
" Extract all of the knowledge triples from the text."
" A knowledge triple is a clause that contains a subject, a predicate,"
" and an object. The subject is the entity being described,"
" the predicate is the property of the subject that is being"
" described, and the object is the value of the property.\n\n"
"EXAMPLE\n"
"It's a state in the US. It's also the number 1 producer of gold in the US.\n\n"
f"Output: (Nevada, is a, state){KG_TRIPLE_DELIMITER}(Nevada, is in, US)"
f"{KG_TRIPLE_DELIMITER}(Nevada, is the number 1 producer of, gold)\n"
"END OF EXAMPLE\n\n"
"EXAMPLE\n"
"I'm going to the store.\n\n"
"Output: NONE\n"
"END OF EXAMPLE\n\n"
"EXAMPLE\n"
"Oh huh. I know Descartes likes to drive antique scooters and play the mandolin.\n"
f"Output: (Descartes, likes to drive, antique scooters){KG_TRIPLE_DELIMITER}(Descartes, plays, mandolin)\n"
"END OF EXAMPLE\n\n"
"EXAMPLE\n"
"{text}"
"Output:"
)

KNOWLEDGE_TRIPLE_EXTRACTION_PROMPT = PromptTemplate(
input_variables=["text"],
template=_DEFAULT_KNOWLEDGE_TRIPLE_EXTRACTION_TEMPLATE,
)

4. Initialise the Chain

Using the descriptive prompt, initialise the chain using LLMChain class.

llm = OpenAI(
api_key=OPENAI_API_KEY,
temperature=0.9
)

# Create an LLMChain using the knowledge triple extraction prompt
chain = LLMChain(llm=llm, prompt=KNOWLEDGE_TRIPLE_EXTRACTION_PROMPT)

To build a knowledge Graph all you need is some inter-related text data. Here, I am loading the text from a string input. However, it is important to note that you can also load from some of the popular data formats such as PDFs, JSON, markdown, etc using data loaders [3] in python.

# Run the chain with the specified text
text = "The city of Paris is the capital and most populous city of France. The Eiffel Tower is a famous landmark in Paris."
triples = chain.invoke(
{'text' : text}
).get('text')

And parse the retrieved triples using this user-defined function

def parse_triples(response, delimiter=KG_TRIPLE_DELIMITER):
if not response:
return []
return response.split(delimiter)

triples_list = parse_triples(triples)

pprint(triples_list)

Output:

[' (Paris, is the capital of, France)',
'(Paris, is the most populous city in, France)',
'(Eiffel Tower, is a, famous landmark)',
'(Eiffel Tower, is in, Paris)']

5. Visualise the Built Knowledge Graph

Here, we will be using the PyVis for creating awesome visualisations of the built Knowledge Graph and displaying it interactively using the Gradio framework.

Here are some user-defined function to make our task easier:

def create_graph_from_triplets(triplets):
G = nx.DiGraph()
for triplet in triplets:
subject, predicate, obj = triplet.strip().split(',')
G.add_edge(subject.strip(), obj.strip(), label=predicate.strip())
return G

def nx_to_pyvis(networkx_graph):
pyvis_graph = Network(notebook=True, cdn_resources='remote')
for node in networkx_graph.nodes():
pyvis_graph.add_node(node)
for edge in networkx_graph.edges(data=True):
pyvis_graph.add_edge(edge[0], edge[1], label=edge[2]["label"])
return pyvis_graph

def generateGraph():
triplets = [t.strip() for t in triples_list if t.strip()]
graph = create_graph_from_triplets(triplets)
pyvis_network = nx_to_pyvis(graph)

pyvis_network.toggle_hide_edges_on_drag(True)
pyvis_network.toggle_physics(False)
pyvis_network.set_edge_smooth('discrete')

html = pyvis_network.generate_html()
html = html.replace("'", "\"")

return f"""<iframe style="width: 100%; height: 600px;margin:0 auto" name="result" allow="midi; geolocation; microphone; camera;
display-capture; encrypted-media;" sandbox="allow-modals allow-forms
allow-scripts allow-same-origin allow-popups
allow-top-navigation-by-user-activation allow-downloads" allowfullscreen=""
allowpaymentrequest="" frameborder="0" srcdoc='{html}'></iframe>"""

Display the generated html by PyVis using the Gradio

demo = gr.Interface(
generateGraph,
inputs=None,
outputs=gr.outputs.HTML(),
title="Knowledge Graph",
allow_flagging='never',
live=True,
)

demo.launch(
height=800,
width="100%"
)

Final Output:
Hurray! Here we displayed our Knowledge Graph using the gradio framwork, so that the page can also be easily shared with anyone online with the link generated. By simply adding the share=True in the demo.launch(share=True) method, you can make the app visible to anyone.

fig 2. Represenation of Knowledge Graph in Gradio Interface

NOTE: You can improve the performance by using more advanced LLMs.

Thanks for reading!

You can find the complete code at the end of the page. See you Again…

CREDITS

This blog is inspired from the Active Loop’s Langchain Vector DB course. Heartful thanks for making this wonderful course online.

--

--