A Knowledge Graph understanding and implementation tutorial for beginners

Raman Kishore
Analytics Vidhya
Published in
5 min readNov 5, 2019
https://upload.wikimedia.org/wikipedia/commons/thumb/0/06/DIKW_Pyramid.svg/1200px-DIKW_Pyramid.svg.png

Data vs Information vs Knowledge

Before building a Knowledge Graph, it is essential to understand the difference between data, information and knowledge (Wisdom is a topic for another day!).

Data generally represents a collection of facts. Upon massaging, filtering and transforming this data, we give it a structure and create Information. The understanding that can be derived from this information is called Knowledge.

Semantic Web

Let me now introduce you to a dream, called Semantic Web. This represents a state of the world where everything on the internet is completely understood by machines. That means all the text, pictures, videos, audios, etc. in web pages are completely understood by machines. (This article deals only with text.)

When we realise this dream, it will open up lot of possibilities on the internet. When computers can understand everything on the internet, the internet transforms from an information sharing platform to a knowledge platform. The computers can then start helping you find content better suited for your needs. They can help you make better decisions by showing non-obvious insights. They can better figure out trends, anomalies, gaps, etc. in various domains. This will also open up a set of businesses that thrive on the internet becoming a knowledge platform.

The web is full of content written in natural language (Ex: English). To realise this dream of Semantic Web, we need to standardise the way we represent information extracted from natural language content. This standardisation will help all the understand the information extracted. RDF (Resource Description Framework) and OWL (Web Ontology Language) are few steps taken towards standardisation.

Knowledge Graphs are a step in the direction of realising Semantic Web!

Machine Learning vs Structured Relational Learning

This section is just to give a broader picture. It is perfectly okay if you do not follow this section.

Machine Learning (ML) operates on data matrices with each row representing an object with features. Whereas, Structured Relational Learning (SRL) works with the assumption that the objects themselves are related to each other and hence, the representation is most likely in the form of Graphs.

Learning techniques applied on Knowledge Graphs is one way of doing Structured Relational Learning!

What is a Knowledge Graph?

A knowledge base is any collection of information. A Knowledge Graph is a structured Knowledge Base.

Knowledge Graphs store facts in the form of relations between different entities.

Remember, we learnt that understanding of information translates to knowledge. So, by extracting facts from a knowledge base and representing these facts in the form of entities and relations, a knowledge graph claims to have an understanding of the information.

Many knowledge graphs currently represent extracted facts in the form of Subject-Predicate-Object (SPO) triples which is in line with the standard prescribed by RDF (Resource Description Framework).

So, to take an example, let’s consider the sentence:

Leonard Nimoy was an actor who played the character Spock in the science-fiction movie Star Trek

The SPO triples (facts) that can be extracted from this sentence are:

Source: https://arxiv.org/pdf/1503.00759.pdf Authors: Maximilian Nickel, Kevin Murphy, Volker Tresp, Evgeniy Gabrilovich (A Review of Relational Machine Learning for Knowledge Graphs)

The above facts, when represented graphically, becomes a knowledge graph:

Source: https://arxiv.org/pdf/1503.00759.pdf Authors: Maximilian Nickel, Kevin Murphy, Volker Tresp, Evgeniy Gabrilovich (A Review of Relational Machine Learning for Knowledge Graphs)

Now that we have understood what a simple Knowledge Graph (KG) looks like, let’s list down the steps involved in building a KG (a basic one!).

  1. Knowledge Extraction:
    a. Extraction of SPO triples (facts) from text. Uses Natural Language Processing (NLP) techniques like dependency parsing. NLP is the backbone of forming a good knowledge graph from textual information.
    b. Entity Recognition & Linking:
    - This is the step that maps Leonard N, L Nimoy, Leo Nimoy, etc. all to a single entity i.e. Leonard Nemoy.
    - DBpedia, the structured data store of Wikipedia is used a single global storage of all the entities, in this tutorial. So Leonard Nemoy would get mapped to http://dbpedia.org/page/Leonard_Nimoy.
  2. Graph Construction:
    a. Removing ambiguities and storing the SPO triples on a Graph Database. Here, the fact represented as an SPO conveys that the Subject is related to the Object through the relationship described by the Predicate.
    b. Another step here would be to process the Graph to achieve things like filling missing links, clustering entities, etc.

I strongly encourage you to read NLP to further understand techniques that can be used in the knowledge extraction step (to extract SPO triples) and also understand the pros, cons and accuracy level of different techniques.

Let’s come to the fun part i.e. demo!

Implementation of Knowledge Graph

The code is available at:

https://github.com/kramankishore/Knowledge-Graph-Intro

  1. Knowledge Extraction: SPO triples extraction using spaCy library in python.
    Check the file knowledgeExtraction.py in the repo to see the code.
  2. Entity Linking: Using DBpedia api to extract all the recognised entities and link them to DBpedia URL.
    Check the file entityRecognitionLinking.py in the repo to see the code.
  3. Map the SPO triples in step 1 with their corresponding DBpedia URLs from step 2. Few entities might not have a DBpedia match directly. More advanced entity disambiguation and linking techniques from NLP need to be applied here to solve this problem.
  4. SPO triples mapped with their DBpedia entity links are then stored into a graph database. I am using neo4j as the graph database and neomodel library in python (though I am not super impressed with neomodel and it’s documentation).
    Check the file graphPopulation.py in the repo to see the code.
  5. After graph data ingestion in neo4j, you can see the visualisation of the graph in neo4j browser, usually accessed at http://localhost:7474/browser/

The code in the above repo is executed for the text input:

Startup companies create jobs and innovation. Bill Gates supports entrepreneurship.

And the resulting Knowledge Graph looks like:

This is just the beginning and the code will start showing funny results as you start trying different inputs!
Now that you have an overview of what Knowledge Graph is and how to construct it, you can start exploring deeper into each layer of knowledge extraction and graph population to improve the accuracy and make it suitable for your application.

Thank you!

References:

  1. https://arxiv.org/pdf/1503.00759.pdf
  2. https://kgtutorial.github.io/
  3. https://www.analyticsvidhya.com/blog/2017/12/introduction-computational-linguistics-dependency-trees/
  4. https://github.com/BrambleXu/knowledge-graph-learning

--

--

Raman Kishore
Analytics Vidhya

CTO | Entrepreneur | Option Seller | Continuously evolving the outlook on life