Knowledge Graphs — What are they?

Charles Samuel R
featurepreneur
Published in
5 min readOct 8, 2020

This post shows what knowledge graphs are and how they help along with an example of knowledge graphs using Python

Photo by Josh Riemer on Unsplash

So picture this:

You open up Google and start typing something you want to look up. You type the term you need to find (Me being a football/soccer fan I am going to say Manchester United). You hit Enter and you’re filled to the brim with at least a billion results of whatever you typed and in record speed too (<1 second to be precise).

The big question here is:

How is it possible?

Google uses something called a Knowledge graph to relate results with each other and display similar searches that people can also check out. The definition of the Google Knowledge graph is best given here (from Wikipedia):

The Google Knowledge Graph is a knowledge base used by Google and its services to enhance its search engine’s results with information gathered from a variety of sources. The information is presented to users in an infobox next to the search results. These infoboxes were added to Google’s search engine in May 2012, starting in the United States, with international expansion by the end of the year. Google has referred to these infoboxes, which appear to the right (top on mobile) of search results, as “knowledge panels”.

Let me show an example of this:

Remember I said my search term was Manchester United. Let me show how the search turned out:

Infobox of Manchester United

See the section “People also search for”. This is a Knowledge graph example.

I’m also going to refer to the team as just United for now.

  • United play in the Premier League.
  • United’s bitter rivals are Manchester City.
  • Leicester City F.C once had a very humiliating result against United so that can be one explanation (I am unsure of this one, however)
  • Inter Milan is a place where a few players came from United very recently

This infobox is a knowledge graph in action at Google. Every single search result now has an infobox relating the term with other terms or factors that affected whatever you’re searching for (Be it a team or anything else). For a movie search, you mostly get similar movies that go along that same genre:

Infobox for the Goal movie

Sorry for going full-on football freak here 😅.

You can see that most of the movie reference terms include Beckham(Beckham played in one of the movies hence Bend It Like Beckham), Soccer, Football, and so on.

This is how knowledge graphs work and now let us see an example where I plotted the knowledge graph of the FAANG (Facebook, Amazon, Apple, Netflix, Google) companies.

Let’s get into it

What you’re gonna need:

  • Your trusty OS (Linux, macOS, Windows)
  • Python ≥3.7
  • The packages which I will explain later

So, the example here involves using the data from Wikidata to get our info for the companies.

To get the data from the site, Python has a package known as QWikidata which has methods useful to get the data from Wikidata and before you ask me the Holy Grail question:

Is the data clean?

I am happy to say that the data from Wikidata is very clean 🎉🎉🎉

Cue heavenly music. 😇

Ok then, we first install the package:

pip install qwikidata

So, the next set of packages is just a staple of every data scientist but make sure you have them installed as well (Preferrable with Colab than Jupyter):

import numpy as npimport pandas as pdimport matplotlib.pyplot as plt

The final checks of installation are these three lines:

from qwikidata.entity import WikidataItem, WikidataLexeme, WikidataProperty
from qwikidata.linked_data_interface import get_entity_dict_from_api
from tqdm.notebook import tqdm

And this last one for visualizing the graphs:

import networkx as nx

Once everything is set we can get into our code:

An example of this is given in this post that I read which you can find here

First, we specify the companies we need to get:

KG_companies = ["Facebook", "Amazon", "Apple Inc.", "Netflix", "Google"]

Now the function of getting data from Wikidata is a bit complicated. Each company has a Q-value. A sample can be like this:

Google has the Q-value as Q95

So you need to know the company’s Q-value before starting to work because the function only accepts the Q-value of the company.

The function to extract the (subject, predicate, object) triple can be found below. Full credits to the article from Auquan for this:

Get data from Wikidata function

Now we specify two things:

  • Company code list
  • Predicate list

The company code list of FAANG and predicate list is given below:

companies_list = ["Q355", "Q3884", "Q312", "Q907311", "Q95"]predicate_list = ["P31", "P17", "P361", "P452", "P112", "P169", "P463", "P355", "P1830", "P1056"]

The company code list is in the same order that I specified on the main company list.

The predicates that are defined are:

  • P31 — Instance of
  • P17 — Country
  • P361 — Part of
  • P452 — Industry
  • P112 — Founded by
  • P169 — Cheif Executive Officer
  • P463 — Member of
  • P355 — Subsidiary
  • P1830 — Owner of
  • P1056 — Product or Material produced

You can look through Wikidata to find which predicate to use.

We now use this function to get the triples and store them in a Pandas Dataframe:

subjects, predicates, objects = get_triples_from_wikidata(companies_list, predicate_list)wiki_triples_df = pd.DataFrame({"subject": subjects, "predicate": predicates, "object": objects})

The DataFrame now looks like this:

Sample of DataFrame

Now we can create a graph for each of these companies and I just packaged the code into a function as shown below:

And now for the final code to get the graphs:

for comp in KG_companies:
create_graph(comp)

Here are just some examples of the knowledge graphs:

The good
The complicated
And the overly complex one

And that’s it for this example

You can find the full notebook over here:

--

--

Charles Samuel R
featurepreneur

My primary focus has been everything Data Science and Data Analytics and now more recently NLP and its applications and other small research here and there