Creating a Sample Knowledge Graph from a Sentence using LLMs

Visheshtaposthali
3 min readJul 2, 2024

--

Knowledge graphs are a powerful way to represent information in a structured form, making it easier to retrieve, analyze, and visualize relationships between entities. With the advent of Large Language Models (LLMs) like those available from Hugging Face, building knowledge graphs from natural language text has become more accessible. In this blog, we will walk through the process of creating a sample knowledge graph from a sentence using Hugging Face models.

What is a Knowledge Graph?

A knowledge graph is a network of real-world entities (like people, places, and things) and their interrelations, stored in a graph database. Each node represents an entity, and each edge represents a relationship between two entities

Prerequisites

To follow along, you’ll need:

Python installed on your system.

The Hugging Face transformers library.

The networkx library for creating and visualizing graphs.

You can install the necessary libraries using pip:

pip install transformers networkx

Step-by-Step Guide

Step 1: Import Libraries

First, let’s import the required libraries.

import networkx as nx
import matplotlib.pyplot as plt
from transformers import pipeline

Step 2: Load Pre-trained Model

We will use a pre-trained Named Entity Recognition (NER) model from Hugging Face to identify entities in the sentence.

ner_pipeline = pipeline("ner", model="dbmdz/bert-large-cased-finetuned-conll03-english")

Step 3: Define the Sentence

Define a sentence from which we will extract entities and relationships.

sentence = "Barack Obama was born in Hawaii. He was elected president in 2008."

Step 4: Extract Entities

Use the NER pipeline to extract entities from the sentence.

entities = ner_pipeline(sentence)
print(entities)

The output will be a list of entities with their labels and positions in the sentence

Step 5: Create the Knowledge Graph

Now, we will create a knowledge graph using the networkx library. For simplicity, we will create relationships based on proximity in the text.

G = nx.DiGraph()

# Add nodes
for entity in entities:
G.add_node(entity['word'], label=entity['entity'])

# Add edges
for i in range(len(entities) - 1):
G.add_edge(entities[i]['word'], entities[i+1]['word'])

# Draw the graph
pos = nx.spring_layout(G)
nx.draw(G, pos, with_labels=True, node_size=3000, node_color='skyblue', font_size=10, font_color='black')
labels = nx.get_edge_attributes(G, 'label')
nx.draw_networkx_edge_labels(G, pos, edge_labels=labels)
plt.show()

This will create a simple knowledge graph where entities are connected based on their proximity in the sentence.

Step 6: Enhancing the Knowledge Graph

For a more sophisticated approach, you can use dependency parsing to identify relationships more accurately. Here, we demonstrate using a pre-trained dependency parsing model from Hugging Face.

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("Davlan/bert-base-multilingual-cased-ner-hrl")
model = AutoModelForTokenClassification.from_pretrained("Davlan/bert-base-multilingual-cased-ner-hrl")

nlp = pipeline("ner", model=model, tokenizer=tokenizer)
result = nlp(sentence)

entities = []
for res in result:
entities.append((res['word'], res['entity']))

# Create a new graph
G = nx.DiGraph()

# Add nodes and edges
for i, entity in enumerate(entities):
G.add_node(entity[0], label=entity[1])
if i > 0:
G.add_edge(entities[i-1][0], entity[0])

# Draw the graph
pos = nx.spring_layout(G)
nx.draw(G, pos, with_labels=True, node_size=3000, node_color='skyblue', font_size=10, font_color='black')
plt.show()

Conclusion

In this blog, we demonstrated how to create a simple knowledge graph from a sentence using Hugging Face models. By leveraging the power of LLMs and the flexibility of the networkx library, you can build more complex and informative knowledge graphs tailored to your specific needs. This is just the beginning; integrating more advanced NLP techniques and richer datasets can lead to even more powerful knowledge representations.

--

--