Unlocking the Power of Knowledge Graph Embedding: An Introduction — Part 1

Learn and understand the basic principle of Knowledge Graph Embedding (KGE) and the simple idea behind it using translation model.

Nadif Fadlurrahman Wahdi

Published in

Kitalulus Engineering

6 min readFeb 22, 2023

The Knowledge Graph

ps. In case you are really clueless about what is graph, you can read my previous article about graph

Graph Theory: The Solution for Your Complex Relationship.

A Brief Introduction of Graph Theory to Network Analysis.

medium.com

Knowledge graph is a special graph, it shows you semantic network between entities and illustrate relationship between them. Formally, it is defined by the set G that contains the collection of entities E, relations R, and facts F:

a fact is a set of triple (h, r, t) ∈ F (‘∈’ means ‘is the member of’, so you can say that triple is a member of collection F) that indicates the relation r ∈ R between head h ∈ E and tail t ∈ E of the triple. Another common way to express the triple (or fact) is <head, relation, tail>.

Or… maybe if you get overwhelmed by the formal definition above: Knowledge graph is a representation that denotes the relations between two or more entities. That’s it.

For example, let’s say you have this kind of dataset

From the dataset above, we can see that there are several relations we can conclude, let’s say we take two relations which are: _is_in_country and _has_university. The first column (city column) will be the head and the rest will be the tail, so the triple that can be explicitly concluded are:

<Bandung, _is_in_country, Indonesia>
<Bandung, _has_university, ITB>
<Singapore, _is_in_country, Singapore>
<Singapore, _has_university, NTU>
<Kuala Lumpur, _is_in_country, Malaysia>
<Kuala Lumpur, _has_university, NTU>

and so on… And the KG illustration that represents the triples above shown below

And of course, you can make it to the ‘next-level’, let’s say you make the knowledge graph a little bit complicated as below:

<ITB, _is_located_in, Bandung> , <Bandung, _is_in, Indonesia>
<NTU, _is_located_in, Singapor> , <Singapore, _is_in, Singapore>
<UM, _is_located_in, Kuala Lumpur>, <Kuala Lumpur, _is_in, Malaysia>

And the illustration that represents the triples above shown below

The Embeddings

TL;DR — The training starts with initializing the random k-dimension vector of entities e and relations r. Then, when the stop condition haven’t reached, it will continuously sample a batch of size b from the training set, and for each triple of the batch is sampled a random corrupted fact by changing the head and/or tail of a triple that makes the fact false. Then, the original triple and corrupted triple are added to training batch and it will be used to optimized the scoring function/minimizing the loss function.

Knowledge Graph Embeddings (KGE for short) is a machine learning task of representing the knowledge graph’s both entities and relations in a low-dimensional space. It can be used for various application, such as link prediction, triple classification, entity recognition, clustering, and relation extraction. A KGE is characterized by four different aspects:

Representation space, which the low-dimensional space where the entities and relations represented,
Scoring function, to measure the quality of our triple embedded representation,
Encoding models, the modality in which the embedded representation of the entities and relations interact with each other,
Additional information, that can enrich the embedded representation. Usually, an ad hoc scoring function in integrated into the general scoring function for each additional information.

The general pseudocode for the embedding procedure is shown below

algorithm Compute entity and relation embeddings is
  input:  The training set S = {(h,r,t)},
          Entity set E,
          Relation set R,
          Embedding dimension k
  output:  Entity and relation embeddings

  initialization:  the entities e and relations r embeddings (vectors) are randomly initialized

  while (stop condition) do
    S_batch <- sample(S,b) //From the training set S, randomly sample a batch of size b
    for each (h,r,t) in S_batch do 
      (h',r,t') <- sample(S') //sample a corrupted fact of triple 
      T_batch <- T_batch ∪ {((h,r,t),(h',r,t'))}
    end for
    update embeddings by minimizing the loss function
  end while

First of all, we need to initialize the random vectors with the dimension of k for entities e and relations r. Then, starting from a training set S until the stop condition reached, the algorithm is continuously optimize the score function (with minimizing the loss function). The stop condition usually is given by the overfitting over the training set. For each iteration, will be sampled a batch size of b from the training set and each triple of the batch is sampled a random corrupted fact — by changing the head and the tail of a triple that makes the fact false. The original triples and the corrupted triples are added in the training batch and then the embeddings are updated and optimizing the scoring function (by minimizing the loss function).

The Models (Translation)

After reading a long explanation about the idea behind the KGE, maybe you haven’t had any idea how that works. Well, maybe, it can be better explained when we are talking about the models. Actually, there are several embedding models for KGE, such as RESCAL, TransE, DistMul, TransH, HolE, TransR, TransD, TransA, ComplEx, STransE, ANALOGY, TorusE, ConvKB, ConvE, SimplE, CapsE and so on. But, in this article, we will use the simple one to make you understand the process of embedding model. In this case, we will use translation-based model, which is TransE.

Translational model is inspired by the idea of translation invariance introduce in word2vec. This model relies on the fact that the entities will get close each other after applying a proper relation in the geometric space in which they are defined. In another word, when the embedding of head is translated by the embedding of relation, the expected result should be the embedding of the tail. The closeness of the translation result of the head with the tail embedding is given by some distance measure.

TransE, in other word, uses a scoring function that allow a simple vector sum equation in each fact in which they appear: h+r=t. This one is a beautiful illustration about how TransE works.

Beautiful illustration of TransE embedding. (left) the entity embedding (right) the relation embedding. (source)

You can see in the picture how each node is connected by a single edge (because there is only one relation in that graph). Let’s take a look, in the illustration above, we already embedded each entity and each relation. The node “0”, if we translate it with the relation “pred” vector, it will be translated into node “1” and so on. So, that’s how h+r=t works in TransE.

This article is just a glimpse of KG Embedding introduction. In the next article, we will discuss more about KG Embedding using pyKEEN library (and of course, the hands-on!). If you have any comment, idea, or question. Just let me know! Cheers.

References:

Ji, Shaoxiong; Pan, Shirui; Cambria, Erik; Marttinen, Pekka; Yu, Philip S. (2021). “A Survey on Knowledge Graphs: Representation, Acquisition, and Applications”. IEEE Transactions on Neural Networks and Learning Systems. PP (2): 494–514.

Bordes, Antoine; Usunier, Nicolas; Garcia-Durán, Alberto; Weston, Jason; Yakhnenko, Oksana (May 2013). “Translating embeddings for modeling multi-relational data”. Proceedings of the 26th International Conference on Neural Information Processing Systems — Volume 2. NIPS’13. Lake Tahoe, Nevada: Curran Associates Inc.: 2787–2795.

Dai, Yuanfei; Wang, Shiping; Xiong, Neal N.; Guo, Wenzhong (May 2020). “A Survey on Knowledge Graph Embedding: Approaches, Applications and Benchmarks”. Electronics. 9 (5): 750.

Rossi, Andrea; Barbosa, Denilson; Firmani, Donatella; Matinata, Antonio; Merialdo, Paolo (2020). “Knowledge Graph Embedding for Link Prediction: A Comparative Analysis”. ACM Transactions on Knowledge Discovery from Data. 15 (2): 1–49.