Knowledge Graphs

Dr. Nimrita Koul
22 min readApr 24, 2024

This is a tutorial based on this excellent and comprehensive tutorial on Knowledge Graphs.

Source Credit: Aidan et al., Knowledge Graphs, https://arxiv.org/pdf/2003.02320.pdf

In the real world around us, the objects and relationships between them can be represented by graphs.

A set of objects, and the connections between them, are naturally expressed as a graph.

A knowledge graph (KG) is an abstract data structure used to represent the structured, related information that is extracted from multiple data sources. E.g., KGs can be used to organise the vast amounts of related knowledge over the Internet and to integrate the data present within an Enterprise. The information represented in a KG is expected to be easily understood and verified by humans.

  • A KG is directed labeled graph data structure made up of 4 components — a set of nodes, a set of edges connecting the nodes, a set of labels one for each edge, and an assignment function which associates an edge with a label.
Source Credit: https://ai.stanford.edu/blog/introduction-to-knowledge-graphs/

In the above directed labeled graph, A and C are the nodes which represent entities A and C, the edge E= (A,C) has a label B. This assignment of label B to edge E can be written as a triple (A,B,C) as shown in the graph above. In this triple, we can refer to A, B and C as the subject, the predicate and the object of the triple respectively.

  • Entities in the graph can be persons, places, companyies, objects like a computer or a chair, events, abstract concepts i.e., the entities that make up our physical or mental world. The nodes and edges have associated domain specific meanings.
  • Labels of the edges indicate the relationship/semantics among the entities they connect. E.g., a relationship of friendship between two people, a relationship of container and contained between two objects, a semantic similarity between two sentences of text.
  • A knowledge graph can use ontologies to define the entities, their properties and the allowed relationships between them to enable logical inference for retrieving implicit knowledge beyong what is explicitly stored.
  • The use of graphs to store related information is not new. Directed graphs have been used to represent data flow graphs, decision diagrams, state charts, conceptual graphs, description logics, rule langauges, probablistic graphical models, Bayesian Networks etc.
  • Recently, KGs have proved successful for improving the applications like search engines, recommendation systems, chatbots and other applications in the areas of natural language processing and computer vision.
  • Knowledge graphs can be used as input to ML algorithms to represent the domain knowledge. These KGs first need to be converted to numerical vectors known as embeddings. Knowledge graph embedding techniques are use to transform entities and relationships into low-dimensional vector representations allowing them to be processed by machine learning (ML) algorithms. Domain knowledge expressed in KGs can improve the predictions of ML algorithms.
  • Artificial Intelligence agents use knowledge graphs (semantic networks) to represent real world information and use this information to reason about the real world.
  • KGs can also be produced as output by natural language processing and computer vision applications e.g., for the task of entity recognition, object detection, image understanding, visual question answering.
  • You can think of a knowledge graph as a mind-map used by an ML algorithm to store and organise structured but related information.

Graph Data Models

Data like images, text, molecules of matter, social networks, citation networks, objects in a picture, programming code, machine learning models, math equations can all be represented as graphs. We can model our data in the form of several kinds of graphs:

1. Directed edge-labelled graphs:

This graph has a set of nodes and a set of edges between those nodes. Nodes represent entities, edges represent binary relations between those entities.

Modeling data as a graph allows you to more easily integrate data from new sources as compared to the standard relational model (RDBMS) which conforms to a schema. Graph data models also are better than organizing data in the form of trees, as in XML or JSON formats, because graph organization does not require data to be organized hierarchically and cycles are allowed.

2. Hetereogenous graphs:

Hetereogeneous graphs or hetereogeneous information networks is a graph in which each node and edge is assigned a type. An edge is homogeneous if it connects two nodes of same type else it is hetereogeneous. This graph allows to partition nodes according to their type.

3. Property Graphs

Source Credit: Aidan et al., Knowledge Graphs, https://arxiv.org/pdf/2003.02320.pdf

Property graphs can model more complex relations by using property-value pairs in addition to the 4 components of a directed, labeled graph and a label to be associated with both nodes and edges of the graph.

Popular graph databases like Neo4j use property graphs for modelling data.

4. A Graph Dataset

A graph dataset is a collection of named graphs and a default graph. A named graph is a pair of a graph ID and a graph, default graph does not have an ID and is referenced by default. These graph datasetsa re used to manage and query linked data composed of interlinked documents of resource description format graphs over the Internet.

5. A Hypergraph

A hypergraph is a graph in which complex edges connect sets (complex sets) rather than pairs of nodes.

6. A Graph Store

Graphs stores are databases used to store and index graphs for efficient quering. It is possible to store directed, labelled graphs in relational databases in the form of a single relation of arity three, or as a binary relation for each property or as n-ary relations for entities of a given type (property tables). Graph stores may also allow for distributing graphs over multiple machines.

Querying Graph Data

The graph querying languages like SPARQL (for RDF graphs), G-CORE (for property graphs) use common primitives like basic graph patterns, relational operators, path expressions etc. for retrieving matching data from a graph.

1. Basic Graph Patterns

A basic graph pattern is like a template to match against a larger data graph. Just like a data graph, a basic graph pattern has nodes and edges, it has variables which act as placeholders for unknown values you are querying. Basic graph patterns also include matchings and success criteria. A matching is a mapping between a variable in the basic graph pattern and an actual value in the data graph. A success is a successful match between a variable in the basic graph pattern and the corresponding constant value in the datagraph.

2. Complex Graph Patterns

Complex graph patterns combine and manipulate the results of basic graph patterns using relational algebra operators like projection, selection, renaming (unary operators), or join, union, difference (binary operators). To handle duplicate results, query languages use two semantics- Bag semantics (preserves duplicates) or Set semantic (eliminates duplicates).

3. Navigational Graph Patterns

You can use path expression in you query written with a Graph query language. A path expression is a regular expression that allows matching arbitrary-length paths between two nodes.

To build path expressions we use these rules:

  • a single edge with a label represents a direct path (r) from one node to another.
  • we can combine path expressions using operators e.g.,

For a path r:

  • r^ indicates the reverse path (in opposite direction)
  • r* indicates zero or more repetitions of the path r
  • r1|r2 indicates a disjunction of paths r1 or r2.
  • r1.r2 indicates concatenation of paths r1 and r2 sequentially.

Schema of a Knowledge Graph

  • A semantic schema establishes the meaning behind the terms (vocabulary) used within a knowledge graph. This enables reasoning over the graph’s data using these defined terms. The schema can define classes to categorize entities in the graph, it can capture relationships between classes. The schema can also define the meaning of edge labels (properties) within the graph.
  • A valdiating schema ensure that the data in teh graph is complete and adheres to specific rules and constraints. Large-scale knowledge graphs often contain diverse and incomplete data. Validation helps guarantee essential information exists for entities. Validating schemas define constraints on the data. A common approach for defining validating schemas is through shapes. A shape targets a set of nodes and specifies constraints on their properties (e.g., number of values allowed, data types).
  • Emergent Schema is a technique for automatically extracting structure from knowledge graphs by discovering inherent structures in data graphs.

Identity in a knowledge graph

  • Identity is used to disambiguate entities within a knowledge graph. Assigning unique identifiers to each node ensures no naming conflicts when integrating external data into the knowledge graph. E.g., using digital object identifiers (DOIs) for documents, ORCID iDs for researchers, ISBNs for books, Alpha-2 codes for countries. Identity can also be provided by linking nodes to external sources (e.g., Wikipedia pages) to provide a reference point for disambiguation.

Data Types

Knowledge graphs can use multiple data types — numbers, strings, booleans, spatial points, temporal values.

Lexicalization of knowledge graphs

  • Lexicalization refers to the process of adding human readable labels, annotations to a knowledge graph to increase readability.
  • Knowledge graphs often use globally unique identifiers (GUIDs) to represent entities. These identifiers can be Human-readable (e.g., chile:Santiago) or Not human-readable by design (e.g., wd:Q2887 in Wikidata)

Existential Nodes

Existential nodes represent relationships involving unknown entities in knowledge graphs. These nodes are represented as blank circles. They capture the relationship between entities without creating ambiguity about the entity’s existence.

Context in a knowledge graph

Most facts within a knowledge graph are contextual i.e., they are true only within a specific context (contextual facts). E.g.,

  • Temporal Context: Events happen within a specific time frame (e.g., India being an independent country since 1947 ).
  • Geographic Context: E.g, the floods in Japan.
  • Provenance Context: Information about where data originated (e.g., data on a specific node from a specified wikipedia article).

Context acts as the scope of truth for a particular piece of data, it clarifies the conditions under which the data is considered valid. Context can be applied at different granularities within a knowledge graph:

  • Individual Nodes (e.g., specific time of existence for a city)
  • Individual Edges (e.g., validity of a connection based on source)
  • Sets of Edges (Subgraphs)

Context can be specified :

1. Directly just like any other data point in the graph.

2. As a reification — making statements about statements (or edges about edges) in a general way.

3. As a higher-arity representations to encode context directly within the edge structure with named graphs and property graphs.

4. Annotations for Context and Reasoning -Annotations define mathematical models representing specific contextual domains. These models can then be used for automated reasoning within those contexts.

Extracting Deductive Knowledge from Knowledge Graphs

Deduction can be used to obtain knowledge from the information represented in a knowledge graph. Machines need formal rules and premises to perform deductions like humans. These rules define what can be logically concluded from a given set of statements (entailment regimes).

Two approaches for capturing these rules are- sub-class relations in a schema and ontologies.

Ontologies

An ontology is a formal definition of terms within a specific context (domain). E.g., An event ontology might specify that an event has one venue and start time, or it might allow for multiple venues and start times.

  • Ontologies contain information about — interpretation, individuals, properties, classes, and other features in a knowledge graph.
  • Ontologies guide data modeling within a knowledge graph based on the defined terminology. They automate entailment by establishing rules based on ontology definitions. They improve consistency within a single knowledge graph and interoperability between multiple graphs through shared understanding of terms.
  • Ontologies also indicate entailments in a graph. We say that one graph entails another if and only if any model of the former graph is also a model of the latter graph. I.e., that the latter graph has no new information over the former graph and thus holds as a logical consequence of the former graph.

Inference Rules for Deductive Reasoning in Knowledge Graphs

  • Inference rules are a way to capture if-then style relationships for automated reasoning.
  • These rules are composed of a body (conditions) and a head (conclusion). If the body pattern matches a subgraph in the data graph, the head pattern is entailed (considered a valid deduction).
  • These rules capture entailments based on ontological conditions and enable automated reasoning within knowledge graphs.
  • By applying rules iteratively to a graph, adding the entailed information back until no new information can be generated is known as materialization. The resulting graph can then be queried directly.
  • Examples of rule languages — Datalog, Horn Clauses, OWL 2RL/RDF.

Description Logics (DLs) for Knowledge Graphs

  • DLs formalize the meaning of elements in knowledge structures.
  • DLs are a family of logics build upon three elements: individuals, classes (concepts), and properties (roles)

Extracting Inductive Knowledge from Knowledge Graphs

Extracting deductive knowledge involves use of rules, but inductive knowledge extraction involves generalising patterns from a given set of input observations, which can then be used to make predictions.

  • We can apply unsupervised learning, self-supervised or supervised learning to learn from graphs.
  • In the case of unsupervised methods, we can use clustering algorithms on graphs to detect communities, find central nodes, edges etc. We can use self-supervised learning to learn graph embeddings (low-dimensional numeric representation of a knowledge graph). A graph embedding maps input edges to an output plausibility score indicating the likelihood of the edge being true. In case of supervised learning, we can use Graph neural networks to learn graph structure and make predictions.

While the above techniques learn numerical models, we can use symbolic learning to learn symbolic models i.e., logical formulae in the form of rules or axioms from a graph in a self-supervised manner.

Graph Analytics

Techniques for graph analytics -

1. Centrality -used to measure most important nodes or edges. Specific node centrality measures are degree, betweenness, closeness, Eigenvector, RageRank, HITS, Katz etc.

2. Community detection — used to identify communities in a graph i.e., subgraphs that are more densely connected internally than to the rest of the graph.

3. Connectivity — to estimate how well connected the graph is.

4. Node similarity — to find nodes similar to other nodes by the way these nodes are connected within their neighbourhood.

5. Path finding — to find paths between specified nodes in a graph.

Many frameworks are available for distributed large scale graph analytics. E.g., Apache Spark (GraphX), GraphLab, Pregel, Shark etc.

Strategies for analytics on data graphs:

1. Projection — projecting a graph by optionally selecting a sub-graph from the data graph from which all edge meta data are removed.

2. Weighting — converting edge meta data into numerical values according to some function.

3. Transformation — transforming graph to a lower arity model.

4. Customization — changing the analytical procedure to incorporate edge meta-data.

Graph Query Languages can project or transform a graph suitable for a particular analytical task. Query languages such as SPARQL, Cypher, and G-CORE allow for outputting graphs, where such queries can be used to select sub-graphs for analysis. Analytics have also been used to rank query results over large graphs, selecting the most important results for presentation to the user.

Supervised learning using Graphs through knowledge graph embeddigns and graph neural networks

Knowledge Graph Embeddings

  • The knowledge graph embedding techniques create a dense representation of the graph in a continuous, low dimensional vector (50>=d>=1000) space that can be used for machine learning tasks.
  • The graph embedding is composed of an entity embedding for each node: a vector with d dimensions that we denote by e; and a relation embedding for each edge label: a vector with d dimensions that we denote by r. The goal of these vectors is to abstract and preserve latent structures in the graph.

To compute a graph embedding, the most common approach is:

  • given an edge from node s to node o with label p, is to use a scoring function that acceptst he entity embedding of node s, the entity embedding of edge label p and the entity embedding of node o and computes the plausibility of the edge: how likely it is to be true.
  • given a data graph, the goal is then to compute the embeddings of dimension d that maximise the plausibility of positive edges (typically edges in the graph) and minimise the plausibility of negative examples (typically edges in the graph with a node or edge label changed such that they are no longer in the graph) according to the given scoring function.
  • The embeddings generated with above procedure can then be used for a number of low-level tasks involving the nodes and edge-labels of the graph from which they were computed.

Popular techniques for graph embeddings:

  • Translational Models — These models interpret edge labels (relations) as transformations between nodes (entities). This technique learns vector representations for entities and relations and aims to minimize the distance between the sum of source entity and relation vectors and the target entity vector for positive examples. It maximizes the distance for negative examples.
  • Tensor decomposition models extract latent factors approximating the graph’s structure. A knowledge graph can be encoded as a 3-order tensor, in which the elements indicate connections between entities with specific relations. This tensor can be decomposed to yield factors or embeddings for entities and relationships. E.g., element-wise multiplication between entity and relation vectors can be used to score the plausibility of edges and create embeddings.
  • Neural Models for embeddings — Neural models use non-linear functions to compute plausibility scores for edges of a graph. E.g., a method using convolutional kernels to generate a matrix from edges and relationships by wrapping each vector over several rows and concatenating both matrices. The concatenated matrix serves as the input for a set of (2D) convolutional layers, which returns a feature map tensor. The feature map tensor is vectorised and projected into d dimensions using a parametrised linear transformation. The plausibility score is then computed based on the dot product of this vector and vector representing edge.
  • Language models like GPT and Gemini compute embeddings of the text they train on. These langauge models can be used to compute graph embeddings. However, there is a difference between graph and text sequence- a graph consists of an unordered set of sequences of three terms (i.e., a set of edges), but text in natural language consists of arbitrary-length sequences of terms (i.e.,sentences of words). Based on text embedding model Word2Vec there is a model RDF2Vec which performs biased random walks on the graph and records the paths (the sequence of nodes and edge labels traversed) as “sentences”, which are then fed as input into the word2vec model. Another model KGloVe is based on the GloVe model. Like the GloVe model, KGlove considers words that co-occur frequently in windows of text to be more related, KGloVe uses personalised PageRank25 to determine the most related nodes to a given node, whose results are then fed into the GloVe model.
  • Entailment aware embedding models create joint embeddings that consider both the data graph and its ontolory (the rules).

Graph Neural Networks

By creating graph embeddings, we can use graphs with existing machine learning models. However, we can also have custom ML models adapted to work with graphs as input and output. One such model is known a Graph Neural Network (GNN).

A graph neural network is a neural network that takes as input a directed graph where nodes and edges are associated with feature vectors that can capture node and edge labels, weights etc.

Image Credit: https://blogs.nvidia.com/blog/what-are-graph-neural-networks/

A neural network already corresponds to a weighted, directed graph, where node serbes as artificial neurons and edges as weighted connection. But there are differences between a conventional feed forward neural network and a neural network that can work with graphs. In a conventional feed foward neural network is homogeneous with sequential layers of nodes, where each node in one layer is connected to all nodes in the next layer. A graph is hetereogeneous, determined by the relationships between entities and that its edges represent.

A Graph neural network (GNN) is a neural network architecture based on the topology of the data graph. i.e., nodes are connected to their neighbours per the data graph.

GNNs support end-to-end supervised learning for specific tasks: given a set of labelled examples, GNNs can be used to classify elements of the graph or the graph itself. GNNs have been used to perform classification over graphs encoding compounds, objects in images, documents, etc.; as well as to predict traffic, build recommender systems, verify software, etc.

Given labelled examples, GNNs can even replace graph algorithms; for example, GNNs have been used to find central nodes in knowledge graphs in a supervised manner.

There are three general types of prediction tasks on graphs: graph-level, node-level, and edge-level. In a graph-level task, we predict a single property for a whole graph. For a node-level task, we predict some property for each node in a graph. For an edge-level task, we want to predict the property or presence of edges in a graph.

Types of GNNS:

  • Recursive Graph Neural Networks: A recursive graph neural network takes as input a directed graph where nodes and edges are associated with feature vectors that can capture node and edge labels, weights, etc. These feature vectors remain fixed throughout the process. Each node in the graph is also associated with a state vector, which is recursively updated based on information from the node’s neighbours i.e., the feature and state vectors of the neighbouring nodes and the feature vectors of the edges extending to/from them using a parametric function, called the transition function. A second parametric function, called the output function, is used to compute the final output for a node based on its own feature and state vector. These functions are applied recursively up to a fixpoint. Both parametric functions can be implemented using neural networks where, given a partial set of supervised nodes in the graph i.e., nodes labelled with their desired output parameters for the transition and output functions can be learnt that best approximate the supervised outputs. To ensure convergence up to a fixpoint, certain restrictions are applied, namely that upon each application of the function, points in the numeric space are brought closer together.
  • Convolutional graph neural networks: For graph neural networks, the core idea of applying small kernels to local regions of an image is applied to a node and its neighbors in the graph. Such GNNs are known as Convolutional Graph Neural Networks(ConvGNNs). The transition function is implemented in ConvGNNs through convolutions. Spectral or spatial representations of a graph can be used with ConvGNNs to define neighbourhood or attention mechanism can be used to learn nodes whose features are most important to the current node.
  • Recursive GNNs (RecGNNs) aggregate information from neigbhours recursively up to a fixed point,and use the same function parameters in uniform steps, while the ConvGNNs apply a fixed number of convolutional layers and different convolutional layers of a ConvGNN can apply different kernels/weights at each distinct step.

Symbolic Learning on Kowledge Graphs

  • The supervised techniques like knowledge graph embeddings and graph neural networks learn numerical models over graphs. But such models lack explainability.
  • Symbolic learning is a more explainable approach. It involves learning a hypotheses in a symbolic language that “explain” a given set of positive and negative edges. The hypotheses then serve as interpretable models that can be used for further deductive reasoning. Rule mining and axiom minimg are two types of symbolic learning.

Creation of Knowledge Graphs

  • Method used to create a knowledge graph depends on the domain, the actors involved, the applications, available data sources.
  • You can build a knowledge graph incrementally, starting with an initial core that can be incrementally enriched from other sources as required.
  • E.g., initially we can just include main entities and their relationships we clearly know, and incrementally add entities and relationships as we discover them.

Collecting data for your knowledge graph:

  • You need collaboration with other humans — employees, domain experts, general public to collect the data.
  • For text data we can use text sources like corpora from newspapers, books, scientific journals, social media, etc can be used to get data. The text data needs to be preprocessed with steps like tokenization, parts of speech tagging, dependency parsing, named entity recognition, entity linking, relation extraction.

Markup data sources: You can extract mark up data from the web as follows:

  • Wrapper based extraction to locate and extract useful information from markup documents. Web table extraction to extract tables embedded in HTML webpages.
  • Deep web crawling to seach information on web forums.
  • Collecting data from the structured sources of data — CSV, JSON, XML, relational databases. You need to map relations in a relational database to a graph, similarly JSON, XML usually have a tree structure which you need to map to graph elements, you may also extract knowledge for a graph from other graphs.

Creation of Schema/Ontology for your knowledge graph

Once the data has been extracted to create your knowledge graph, you need to create a schema for it using ontology engineering methods or learning an ontology automatically. There are several methods to systematically and manually create an ontology based on your data. Automated learning of ontology do not require manual intervention.

Assessing quality of your knowledge graphs

  1. Accuracy: Accuracy refers to the extent to which entities and relations encoded by nodes and edges in the graph correctly represent real-life phenomena. Accuracy can be further sub-divided into three dimensions: syntactic accuracy, semantic accuracy, and timeliness.
  • Syntactic accuracy is the degree to which the data are accurate with respect to the grammatical rules defined for the domain and/or data model.
  • Semantic accuracy is the degree to which data values correctly represent real world phenomena.
  • Timeliness is the degree to which the knowledge graph is currently up-to-date with the real world state.

2. Coverage — Coverage refers to avoiding the omission of domain-relevant elements, which otherwise may yield incomplete query results or entailments, biased models, etc

  • Completeness: refers to the degree to which all required information is present in a particular dataset. It includes schema completeness, property completeness, population completeness, linkability completeness.
  • Representativeness focuses on assessing high-level biases in what is included/excluded from the knowledge graph. This metric assumes that it is a sample of the ideal knowledge graph and asks how biased this sample is.

3. Coherency: how well the knowledge graph conforms to or is coherent with the formal semantics and constraints defined at the schema-level.

  • Consistency means that a knowledge graph is free of (logical/formal) contradictions with respect to the particular logical entailment considered.
  • Validity means that the knowledge graph is free of constraint violations, such as captured by shape expressions.

4. Succinctness: refers to the inclusion only of relevant content that is represented in a concise and intelligible manner.

  • Conciseness refers to avoiding the inclusion of schema and data elements that are irrelevant to the domain
  • Representational-conciseness refers to the extent to which content is compactly represented in the knowledge graph.
  • Understandability refers to the ease with which data can be interpreted without ambiguity by human users.

Refinement of Knowledge graphs

Once a KG is created, it has scope for refinement through completion of missing information and repair of inconsistent knowledge.

  • Completion: To fill in the missing edges of a knowledge graph, i.e., edges that are deemed correct but are neither given nor entailed by the knowledge graph. This is done using link prediction techniques like general link prediction using knowledge graph embeddings, rule/axiom mining, type-link prediction, identity-link prediction (predicting identity links involves searching for nodes that refer to the same entity). Such techniques usually use value matchers and context matchers. Value matchers determine how similar the values of two entities (strings, numbers, dates, objects etc) on a given property are. Context matchers consider the similarity of entities based on various nodes and edges.

Correction of Knowledge Graphs

While completion finds new edges in a knowledge graph, correction identifies and removes existing incorrect edges in the knowledge graph. Two main approaches to correction are fact validation and inconsistency repairs.

  • Fact validation: The task of fact validation involves assigning plausibility or veracity scores to facts/edges, typically between 0 and 1. An ideal fact-checking function assumes an ideal knowledge graph as ground truth and would return 1 for the fact that exists in the ground truth and a 0 for a fact that does not exist in the ground truth.
  • Inconsistency repairs: Inconsistencies can arise in knowledge graphs due to axioms e.g., disjointness. We need to detect and repair such inconsistencies. Simple inconsistencies can be repaired by including an entity to both of the disjoint classes, or by removing a type assignment. Automated repair methods can be used.

Publication of your knowledge graphs

  • If you desire to publish your knowledge graph publicly, follow FAIR principles of — Findability, Accessibility, Interoperability, Reusability and the Linked Data Principles: (1) Use Internationalized Resource Identifiers (IRIs) as names for things. (2) Use HTTP IRIs so those names can be looked up. (3) When a HTTP IRI is looked up, provide useful content about the entity that the IRI names using standard data formats. (4) Include links to the IRIs of related entities in the content returned.

Access Protocols for knowledge graphs.

To allow the public to interact with your knowledge graph, you need to use protocols that define the requests that agents can make and the response that they can expect as a result. For public access, this protocol should be open, free, and universally implementable. Some access protocols for KGs are:

  1. Dumps — Knowledge graphs (KGs) often provide access to their data through dumps. Dumps are essentially files (or collections of files) containing the knowledge graph’s content in a specific format. Dumps offer a relatively easy way to download a complete snapshot of a knowledge graph at a specific point in time.The format of the dump file depends on the knowledge graph itself. Common formats include RDF (Resource Description Framework), JSON (JavaScript Object Notation), and custom formats specific to certain knowledge graphs.

You can obtain a dump of a kG using:

  • Download Links: direct download on the websites of knowledge graphs.
  • APIs: A knowledge graph might offer programmatic access to dumps through APIs.
  • Requesting Access: you might need to contact the knowledge graph provider directly to request access to dumps.

2. Node lookups: Protocols for performing node lookups accept a node (id) request and return a (sub-)graph describing that node

Controlling usage of your knowledge graph

  • Licensing: The W3C Open Digital Rights Language (ODRL) provides an information model and related vocabularies to specify permissions, duties, and prohibitions with respect to actions relating to your knowledge graphs.
  • Usage Policies: You can restrict access to parts of a knowledge graph using WebAccessControl framework.
  • Encrypting parts of your published knowledge graph.
  • Anonymization of parts of your knowledge graph to protect privacy.

Prominent Knowledge Graphs

  1. Open Knowledge Graphs:
  • DBpedia: The DBpedia project was developed to extract a graph-structured representation of the semi-structured data embedded in Wikipedia articles enabling the integration, processing, and querying of these data in a unified manner. The resulting knowledge graph is further enriched by linking to external open resources, including images, webpages, and external datasets such as DailyMed, DrugBank, GeoNames, MusicBrainz, New York Times, and WordNet.
  • YAGO: Yet Another Great Ontology (YAGO) extracts graph-structured data from Wikipedia, which are then unified with the hierarchical structure of WordNet.
  • WIkidata: Wikidata is a centralised, collaboratively edited knowledge graph to supply Wikipedia and arbitrary other clients with data.
  • Domain specific open knowledge graphs — OpenCitations, SciGraph, Microsoft Academic Knowledge Graph, LinkedGeoData, Bio2RDF.

2. Proprietary Knowledge Graphs

  • Google Knowlegde Graph for web search.
  • Amazon, Airbnb, Uber built knowledge graphs for commerce.
  • Meta and LinkedIn built knowledge graph for social media.
  • Thompson Reuters, Accenture, Capital One, Wells Fargo built knowledge graphs for finance.
  • IBM, AstraZeneca built KGs for health care.

Future Directions in Knowledge Graph Research

The goal of research in knowledge graphs is to extract maximum valuable knowledge from diverse sources of data.

This research includes the fields of graph databases, knowledge representation, logic, machine learning, graph algorithms, ontology engineering, data quality, natural langauge processing, information extraction, privacy and security etc.

Current topics in KG research include formal semantics for property graphs, reasoning and querying over contextual data, similarity based query relaxation, shape induction, contextual knowledge graph embeddings, entailment-aware knowledge graph embeddings, expressive graph neural networks, rule and axiom mining. The goal is to improve scalability, quality, diversity, usability of graphs.

References:

  1. Knowledge Graphs, https://arxiv.org/pdf/2003.02320.pdf

2. https://ai.stanford.edu/blog/introduction-to-knowledge-graphs/

3. https://distill.pub/2021/gnn-intro/

--

--