Querying using simple knowledge graphs

Vishnu Nandakumar
Analytics Vidhya
Published in
3 min readFeb 19, 2021

People tend to get fascinated when they get an answer to some question that they have. There are many ways to create a simple querying or Q&A system which can be deployed as a simple information application with a database running in the backend. Here I have tried to create a querying app using some of the popular packages in python.

What is a knowledge graph?

In simple words, a knowledge graph is a database in the form of connections between data points. The main attributes of the knowledge graph are nodes and edges. To create a knowledge graph from a text, you just need to know the basic grammar concepts like what is subject and object of a sentence, what is a verb, etc. The nodes denote the subject and object while they are linked by edges that are represented by the verb part in the sentence. If you are good with the part-of-speech concept, you can easily build a knowledge graph base for your textual documents. Please refer to the below given example for reference. The sentence is “Arsenal plays football”, and the below-given image denotes the knowledge graph built for it.

Building the knowledge graph

We are going to use a news headline dataset to create the graph database for the solution that we think of preparing. This dataset provides the perfect setup for creating the knowledge database. We will using the headlines from the dataset for creating our knowledge graph. Below given image gives us a gist of the dataset.

So here we are using part of speech tagging from spacy package to determine the different parts of each sentence. So our logic should be able to get the subject, object, and verb part that links the former two parts.

So the pattern should be “Noun-Verb-Noun”, as we can see from the above code snippet I have inserted a separator ‘<’ between the verb and noun part of the sentence. This is a simple way to create a ‘subject-edge-object’ set, this can be made much more complicated and better by using more POS, Dependency parsing logic.

The above code snippet helps in querying the dataframe with the initiation text that we ask. For eg assume there is a sentence “Mumbai cops stop patrolling the island due to lack of boats”, and if we want to ask something like “What did cops in Mumbai stop?” we can rephrase it as “Mumbai cops stop -”. I have set the threshold score to be 75% and the resulting answer is as given below

The query function takes in two inputs: a string to be queried and a confidence threshold to filter the best results. The score is the average fuzzy similarity score between the list of words in the dataframe and the tokens in the query string. The efficiency of the algorithm is improved further by removing the stopwords while creating the sets of “subject-relations-object”.

We can also visualize the graph data by leveraging pandas and networkx libraries. Below given snippet gives how the subject and object are linked, the line indicates the link. I have used “declared” as relation to be searched for.

So guys we have created a graph database and analyzed how we can make it queriable for simple Q&A stuff. This is a very simple and primitive way to implement the knowledge graph and querying it, it can be made more efficient and complex. Please leave your valuable feedback.

--

--

Vishnu Nandakumar
Analytics Vidhya

Machine Learning Engineer, Cloud Computing (AWS), Arsenal Fan. Have a look at my page: https://bit.ly/m/vishnunandakumar