NLP-Knowledge Graph

Sarang Mete
3 min readNov 1, 2022

--

Explore different libraries and create production ready code

Photo by fabio on Unsplash

Knowledge Graphs(KG) are one of the most important NLP tasks. KG is nothing but way of representing information extraction/relationship(subject,object,relation) from text.

In this article, we’ll explore a process to create KG.

Steps in creation of Knowledge Graph:

  1. Coreference Resolution
  2. Named Entity Recognition
  3. Entity Linking
  4. Relationship Extraction
  5. Knowledge Graph Creation

We’ll use following Input Text to create KG

Tesla CEO Elon Musk sold $6.9 billion worth of shares , 
he said that the funds could be used to finance Twitter if he loses a legal battle with it.
Tesla factory in Austin was shut for two days, Laila Shahrokhshahi reports.
Tesla Inc has hired former Hewlett Packard lawyer Derek Windham to helm its legal team.
  1. Coreference Resolution:

Convert pronouns to their original nouns. You can read about it more in my project.

Coreference resolution Output: Text in Bold is resolved

Tesla CEO Elon Musk sold $6.9 billion worth of shares , Musk said that the funds could be used to finance Twitter if Musk loses a legal battle with Twitter .Tesla factory in Austin was shut for two days, Laila Shahrokhshahi reports.Tesla Inc has hired former Hewlett Packard lawyer Derek Windham to helm Inc legal team.

2.Named Entity Recognition(NER)

We can skip this step and just get all relationships extracted. However, sometimes you ‘ll need only certain entities types and their relationships. We can extract default entities like NAME,PERSON etc from many available libraries or we can also build our own NER model. I’ve created a project to build custom NER-PERSON,ORG,PLACE,ROLE. But for knowledge graph,I am getting all relationships.Refer my Custom NER project.

Output of custom NER

Tesla CUSTOM_ORG
CEO CUSTOM_ROLE
Elon Musk CUSTOM_PERSON
Twitter CUSTOM_ORG
Tesla CUSTOM_ORG
Austin CUSTOM_PLACE
Laila Shahrokhshahi CUSTOM_PERSON
Tesla Inc CUSTOM_ORG
Hewlett CUSTOM_ORG
Packard lawyer CUSTOM_ROLE
Derek Windham CUSTOM_PERSON

3.Entity Linking/Entity Disambiguation

We can get different words/nouns for same entity. Example, U.S,United States of America,America. All these should be considered as one entity. We can achieve this by getting their root id if we have some knowledge base. Here, we are going to use Wikipedia knowledge. So, many time entity linking is also called as wikification.

4.Relationship Extraction

It means fetching relationship in text.

I’ve explored couple of libraries- Stanford Open IE and rebel libraries. Please check notebook.

I selected rebel for my final implementation because Stanford Open IE output was little redundant and it is slow.

Output of rebel relationship extraction:

       subject           object               relation
0 Elon Musk Twitter owner of
1 Twitter Elon Musk owned by
2 Tesla Austin headquarters location
3 Derek Windham Hewlett Packard employer

5. Knowledge Graph Creation

I’ve explored neo4j python wrapper py2neo and networkx in a notebook and selected networkx just because ease of use for visualization. We should go for more powerful neo4j if want to use graph databases and perform further analysis but we are not doing that here.

Output of networkx:

Image by Author

sample output of py2neo for different text:

Image by Author

I’ve created a complete end to end project for Knowledge Graph creation to deployment. The project is production ready. You can refer it here.

The main challenges I’ve solved in this project:

  1. Explore different libraries for different tasks in KG creation and integrate them in final solution.
  2. Create production ready code for KG NLP use case

If you liked the article or have any suggestions/comments, please share them below!

Let’s connect and discuss on LinkedIn

--

--