A brief summary of Knowledge Graphs

Emre Yüksel
7 min readOct 10, 2022

--

What is a Knowledge Graph ?

Knowledge representation and reasoning is one of the popular fields in artificial intelligence dedicated to offering methods that will make it possible to build complex systems by trying to incorporate how people represent information and solve problems. It can be said as the main technology for the Semantic Web.. The Semantic Web combine concepts from knowledge representation and reasoning with markup languages based on XML. Resource Description Framework (RDF) and The Web Ontology Language(OWL) provide methods for modeling knowledge-based objects and their semantics.

So here is the knowledge graph comes in! There are many ways to represent the raw data and graph methods is one of the rich ways to do that. Graphs are obtained by adding a relationship with each other in structured, unstructured, or semi-structured data. By applying the semantics to an existing graph, semantically rich graphs, named Knowledge Graphs (KG), are formed.

A knowledge graph is a semantic network that represents information of interlinked descriptions of concepts in a given domain according to an ontology and visualizes the relationship between them. It is a directed graph consisting of three components, a node, an edge, and a label. In the semantic web, RDF is used for representing the KG. Resource Description Framework (RDF) is a method for the description and exchange of graph data. It makes the knowledge graph more powerful for knowledge representation compared to ontological and relational models.

Figure 1. Knowledge Graph example, image credit

In the example above you can see an example of KG. It contains multiple triplets (subject-object-predicate). To give an example of triplets:

  1. Bob(Subject) is a friend of (predicate) Alice (Object)
  2. Bob(Subject) is interested in (predicate) The Mona Lisa (Object)

We can also represent the triplets as a pseudocode like (obtained from [5]):

<Bob> <is a> <person>.
<Bob> <is a friend of> <Alice>.
<Bob> <is born on> <the 4th of July 1990>.
<Bob> <is interested in> <the Mona Lisa>.
<the Mona Lisa> <was created by> <Leonardo da Vinci>.
<the video 'La Joconde à Washington'> <is about> <the Mona Lisa>

Although it is a simple example, you can see that the nodes and relationships are in various subjects and fields. This feature helps to represent many aspects of data together with the hidden information in it.

Characteristics of Knowledge Graph

As we discussed in the first section, the graph is one of the methods for representing the data. A knowledge graph is a data graph enriched with representations of schema, identity, context, ontologies, and/or rules.

Schema

Schemas are used to indicate a high-level structure or semantics that the graph follows. It is often used when describing the meanings of high-level terms that facilitate reasoning on graphs. For example, RDF graphs use the RDF schema to define subclasses, sub-properties, domains, and ranges between classes and properties. The semantics of these properties can be also defined in much more depth with the Web Ontology Language standard for RDF graphs.

Figure 2. Example schema graph, image credit

Identity

In graphs, nodes can represent persons, places, or events. However, the node value can refer to more than one thing. Therefore, globally unique identifiers or external identity links to disambiguate a node from an external source can be used to avoid such ambiguities.

Context

The facts presented in the data graph can be considered true in a given context. Making contexts clear can allow data to be interpreted from different perspectives, such as understanding what is true. The context for graph data can be thought of at different levels: individual nodes, individual edges, or sets of edges (subgraphs). There is a couple of ways to make the context clear such as direct representation, reification, higher-arity representation, and annotations.

Ontologies

An ontology serves to represent the relationships that are used in a knowledge graph. The concepts such as Bob, and Mona Lisa, and relationships like ‘is interested in’, and ‘is born on’ constitute an ontology. Inferences can be made by using these ontologies in knowledge graphs. Ontologies are represented by the Resource Description Framework (RDF) triplets, as in knowledge graphs. To make the social web more interoperable and reusable, the World Wide Web Consortium (W3C) aimed to facilitate information gathering from the Internet by standardizing the family of information representation languages.

A Turtle script for knowledge graph given in Figure 1

As you can see in the example above, predefined ontologies are used for creating references. For example, foaf which is an abbrevation of Friend of a Friend is a machine-readable ontology describing persons. This ontology created on line 2 provides shortcuts for writing International Resource Identifiers (IRIs). In this way, we can use this ontology by using the abbreviation on line 9. Executing the following lines after the rows that begin with PREFIX gives us the knowledge graph given in Figure 1.

Real World Example

Knowledge graphs play a major role in organizing information on the Internet. For example, as a result of your research using a few words on Google, you will see more than one piece of information about that subject. These are obtained as a result of querying the information graphs in the databases.

When Ankara, the capital of Turkey, is searched on Google, you will see other information extracted from the Infoboxes from Wikipedia such as the area, elevation, weather, and events that appear on the right side of the photo. This extra information about the subject being searched is obtained by querying a KG called Wikidata. Wikidata is just one example for open knowledge graphs.

Knowledge Graph in practise

There are two types of knowledge graphs in practice: Open knowledge graphs and enterprise knowledge graphs. Open knowledge graph publishes their content online, making it accessible to the public interest while enterprise knowledge graphs are typically internal to a company and used for commercial use-cases. To give an example of open graphs:

Open Knowledge Graphs

Many open knowledge graphs are published in the form of Linked Open Datasets, which are (RDF) graphs published under Linked Data principles. Many graphs in the literature are modeled in published RDF following the principles of Linked Data and offer access to their data via dumps (RDF), node searches (Linked Data), and graph patterns (SPARQL).

  1. Wikidata: It is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation.
  2. Freebase: An individual is a knowledge base mostly created by community members, containing structured data collected from many sources, including user-submitted wiki contributions.
  3. DBpedia: DBpedia is a project aiming to extract structured content from the information created in the Wikipedia project
  4. YAGO: YAGO is an open-source semantic knowledge base developed at the Max Planck Institute for Computer Science in Saarbrücken. It is extracted from Wikipedia and other sources automatically. As of 2019, YAGO3 has knowledge of more than 10 million entities and contains more than 120 million facts about these entities

Enterprise Knowledge Graphs

  1. Web search (e.g. Google, Yahoo, Bing)
  2. Commerce (e.g. Amazon, Airbnb, eBay, Uber)
  3. Social Networks (e.g. Facebook, Twitter, Instagram, Linkedin)
  4. Finance (e.g. Accenture, Bloomberg)

These knowledge graphs are graphs that companies create using their own data and are not usually posted publicly.

Querying graphs

There are many languages that have been proposed for querying graphs such as SPARQL for RDF graphs, Cypher, Gremlin, and G-CORE for querying property graphs.

SPARQL is the standard query language and protocol for Linked Open Data and it enables users to query information from databases or any data source that can be viewed as RDF. Like any other SQL language, SPARQL can be used for retrieving and manipulating data in a relational database for NoSQL graph databases. It supports aggregation, subqueries, negation, creating values by expressions, extensible value testing, and constraining queries by source RDF graph.

Knowledge Graph in Machine Learning

With the popularity of artificial intelligence in recent years, machine learning studies have started to gain popularity in the literature. Examples of the use of knowledge graphs in machine learning include question-answering (a subfield of natural language processing), information retrieval, recommendation systems, store research, and supply chain management.

Machine learning is successful depending on the number and quality of data. The ability to teach the semantic information in the data correctly to the algorithms makes the algorithms more reliable, explainable, and robust. The reason why knowledge graphs are powerful for these kinds of algorithms is that they capture, persist, and make contextual information usable and show the rich relationships between data.

Conclusion

This is just a summary for knowledge graphs! We talked about how knowledge graphs are data graphs that aim to collect and convey real world information, how they are represented, how they are accessible and usable, their types and their role in today’s artificial intelligence field. I hope it was useful.

References

[1] Hogan, Aidan, et al. “Knowledge graphs.” ACM Computing Surveys (CSUR) 54.4 (2021): 1–37.

[2] Chaudri, V. K., Chittar N., Genesereth M., An Introduction to Knowledge Graphs, The Stanford AI Lab Blog, published at May 10, 2021

[3] Knowledge Graph

[4] Knowledge representation and reasoning

[5] RDF 1.1 Primer

--

--