GNNs Beyond Triples

Representation Learning on RDF* and LPG Knowledge Graphs

Hyper-relational KGs encode much more knowledge than triple KGs. We adopt recent advances in Graph ML and propose a GNN encoder for such graphs along with a new benchmark.

Michael Galkin
Towards Data Science
12 min readSep 24, 2020

--

Image by Author.

Knowledge graphs (KGs) are a cornerstone of modern NLP and AI applications — recent works include Question Answering, Entity & Relation Linking, Language Modeling, Information Extraction, and even playing text RPGs with Reinforcement Learning. Furthermore, KGs are already widely adopted in the industry, e.g., a line-up of works from the recent Knowledge Graph Conference (KGC)

Triples vs The World

Traditionally, KGs are encoded as <subject, predicate, object> (RDF) triples, and many publicly available KGs like DBpedia and YAGO initially followed this paradigm backed by expressive logical formalisms (remember the times when DL referred to Description Logics ? 👵) and standards like RDF and OWL.

Triple-based facts. Did Einstein attend both universities at the same time? Image by Author.

Using triples, the above example describing which universities Albert Einstein attended can be encoded as two triples:

Albert Einstein, educated at, ETH Zurich
Albert Einstein, educated at, University of Zurich

Well, it looks okayish for simple applications but turns out our world is a bit more complex to fit everything into triples. For instance, do the two triples imply that Albert Einstein was educated at the two places simultaneously? Or they awarded him with the same degree?

In fact, Einstein received a Bachelor’s degree at ETH Zurich with a major in mathematics, whereas at the University of Zurich he obtained a doctoral degree with a major in physics.

Can we have a mechanism to express facts in more detail?

Yes! In the KG world, there are at least two ways to do that — RDF* Graphs and Labeled Property Graphs (LPG). Both of them allow instantiating each fact further by attaching auxiliary key-value (relation-entity) pairs to edges in a KG. And both of them are already supported by major vendors on the graph databases market 👍.

In the LPG world, both nodes and edges can naturally have key-value attributes. Neo4j is probably the biggest name in the LPG family. You can query LPGs with (Open)Cypher. A recent COVID-19 KG is available as a Neo4j dump.

RDF* originally proposed by Olaf Hartig (his blog is a good starting source to get into RDF* and related tech) aims to alleviate many issues of the infamous RDF reification mechanism (check out this survey by Frey et al for a comprehensive overview of reification) while retaining reasoning capabilities pertaining to RDF graphs. Backed by solid theoretical foundations, RDF* provides a handful of ways to enrich triples with more details. You can query RDF* graphs with SPARQL* (an extension of SPARQL for working with RDF*). Apache Jena, RDF4J, N3.js, Blazegraph, AnzoGraph, Stardog, and GraphDB support RDF* and SPARQL*

Our example in the RDF* syntax can look like this:

<< Albert_Einstein educated_at ETH_Zurich >> 
academic_degree Bachelor ;
academic_major Maths .
<< Albert_Einstein educated_at University_of_Zurich >>
academic_degree Doctorate ;
academic_major Physics.

Hyper-relational graph or hypergraph?

What is a suitable term for such KGs? There was a small vocabulary gap until Rosso et al in their recent WWW’20 work suggested using “hyper-relational graph”. Still, there is a common misuse of the term “hypergraph”. We’d like to advocate for “hyper-relational” graphs as well. 🤝

The main difference is in the representation of a fact. Hypergraphs assume there is one (named) hyper-edge unifying several entities:

education(Albert Einstein, ETH Zurich, Bachelor, Mathematics)
education(Albert Einstein, University of Zurich, Doctorate, Physics)
Hyperedges consisting of 4 nodes each. Note that we lose predicates pertaining to academic major and academic degree. Image by Author.

Looks like an n-ary relation, right? 🤨 We have several problems here:

  1. We lose the typed relations academic_degree and academic_major associated with Bachelor/Doctorate and Mathematics/Physics, respectively. Instead, the type of a hyperedge is an abstraction (or a rather weird semantic mixture 🤯) of educated_at, academic_degree and academic_major. What if one fact also contains an auxiliary predicate academic_supervisor? We’ll need to define a new hyper-edge, say, education1(), mixing 4 those relations which explodes exponentially with the amount of predicates and qualifiers. 📈

2. Moreover, we also lose the auxiliary character of degrees and majors, i.e., they are intended to describe their main triples. For example, Bachelor and Mathematics are auxiliary wrt Albert Einstein and ETH Zurich and, hence, should be treated in this manner. A tuple of entities in a hyper-edge assumes equal importance of its elements.

That said, in the following parts we’ll adhere to hyper-relational approaches.

Hyper-Relational Knowledge Graphs in the Wild

In 2020, most of the open-domain KGs widely use hyper-relational facts. Wikidata and its Wikidata Statements model is a good example of a hyper-relational KG. Each fact in Wikidata is a Statement with the main triple and a set of auxiliary qualifiers — entity-relation pairs. With Wikidata statements our Albert Einstein example can be modeled as easy as following:

Hyper-relational facts. Image by Author.

In those statements, (academic_degree, Bachelor) and (academic_major, Mathematics) are qualifiers for the triple <Albert Einstein, educated_at, ETH Zurich>.

It’s important to notice that Wikidata (and generally the hyper-relational paradigm) does not separate between entities and predicates used exclusively in the main triples or qualifiers, i.e.,

all predicates and entities can be used either in triple terms or qualifiers

(albeit there can be some entities and relations seen only in qualifiers in the current Wikidata version). We will use this property in the following sections.

As to other KGs, starting from 2018, new releases of DBpedia contain reified statements similar to those in Wikidata. YAGO 4 adopted RDF* encoding of the facts as well. 🙌

What about Freebase? 🤔 Well, in 2020 you should probably not practice necromancy 🧟‍♂️ as Freebase is no longer supported nor updated. However, Compound Value Types (CVT) nodes in Freebase do resemble reification of triples [but look more like n-ary relations]

Graph Representation Learning

Our task here is to learn representations of hyper-relational graphs.

By representations we refer to entity (node) and relation (typed edge) embeddings. The embeddings can then be used in downstream tasks like link prediction, node classification, entity alignment, and many more that can be used in NLP, CV, and other AI areas. 🍀

The field of graph representation learning (GRL) is one of the fastest-growing 🚀 areas of machine learning, there is a handful of articles (a series of posts by Michael Bronstein, reviews (mine, Sergey’s) from ICLR’20 and NeurIPS’19 papers), books (by William Hamilton, by Ma and Tang), courses (CS224W, COMP 766, ESE 680), and even a GraphML Telegram channel (subscribe 😉) covering basic and advanced topics.

In the encoder-decoder paradigm, an encoder is usually a GNN (graph neural net), and a decoder is a function of embeddings that returns a value or a vector pertaining to a certain downstream task, e.g., a probability of an entity being an object of a given <subject, predicate> pair.

What’s there for triple-based KGs?

Quite a lot! 😍

Encoders: a family of multi-relational GNN encoders like R-GCN (Schlichtkrull et al, ESWC 2018) and CompGCN (Vashishth et al, ICLR 2020) that extend the original Graph Convolutional Network (GCN) algorithm within a message passing framework.

Decoders: actually, traditional KG embedding algorithms like TransE, ConvE, RotatE, etc are good examples of decoders for the link prediction task. Originally, they can also be trained as decoder-only models optimized end-to-end directly 👉 on link prediction tasks.

What’s there for hyper-relational KGs?

Hmm, not that much 🤔 (as of Fall 2020)

Encoders: ???

Decoders: HINGE proposed by Rosso et al is an end-to-end CNN-based model for link prediction on hyper-relational graphs.

Well, we couldn’t cope with such a glaring abyss 🕳 in the GNN encoders part and proposed ⭐️ StarE ⭐️ in our recent EMNLP’20 paper “Message Passing for Hyper-Relational Knowledge Graphs” written together with Priyansh Trivedi, Gaurav Maheshwari, Ricardo Usbeck, and Jens Lehmann.

StarE is a multi-relational GNN encoder that extends CompGCN for hyper-relational KGs.

The name is inspired by RDF* 😊 StarE is designed with the following features in mind:

An example of aggregating qualifiers in StarE. Image by Author.
  • Explicit modeling of relations, including qualifying relations;
  • Separation of auxiliary entities and relations in qualifiers from entities and relations in main triples;
  • Still, any entity and any relation can still be used in main triples as well qualifiers;
  • Permutation invariance to the order of qualifiers — they do not exhibit any particular order and can be freely re-arranged. That is, it does not matter for the main triple <<Albert Einstein, educated at, ETH Zurich>> whether (academic degree, Bachelor) comes before (academic major, Physics) or after.

A Bit of Math for Aficionados 👩‍🔬

Let’s trace the evolution of relation-aware GNN encoders in their neighborhood aggregation schemes:

Message passing algorithms. Image by Author.

In StarE, a main triple relation h_r appearing between nodes u and v is augmented with an aggregated vector of qualifiers h_q via a function gamma() which can be a weighted sum, multiplication, concat, or any other binary function (we went for the weighted sum).

We obtain the vector h_qvia:

That is, we first pool qualifier relation and entity embeddings h_{qr} and h_{qv}, respectively, in a single vector through a composition function phi_q() which could be a scoring function from the KG embedding family, e.g., RotatE. We then apply a permutation invariant aggregation function (sum, although we also probed multiplication) to pool an arbitrary number of qualifiers into a single vector and finally project it through a transformation matrix W_q.

Since all entities and relations can be generally seen in main triples as well as qualifiers, W_q is intended to learn qualifier-specific representations of entities and relations.

We still retain CompGCN components: phi_() is a composition function similar to phi_q(), but now it merges a node with an enriched edge representation. W_{\lambda} is a weight parameter for incoming, outgoing, and self-looping relations.

Sparse Encoding of Hyper-Relational KGs

GNNs operate on sparse matrices for efficiency reasons. For multi-relational triple-based KGs the following example triples

Q937, P69, Q11942
Q937, P69, Q206702

Can be presented in the COO format as a [2, num_edges] tensor with an additional row for edge types

[Q937, Q937]
[Q11942, Q206702]
[P69, P69]

Hyper-relational facts with qualifiers can be written as follows:

Q937, P69, Q206702, P812, Q413, P512, Q849697
s, r, o, qr1, qv1, qr2, qv2, …, qrN, qvN

Where the first three entries always denote the “main” triple and the subsequent pairs are qualifiers in no particular order (remember the order invariance in Wikidata)

What can be a sparse representation of hyper-relational KGs where each “column” of the COO matrix might have an arbitrary number of qualifiers? In the paper, we propose the following encoding:

Sparse Encoding of Hyper-Relational KGs. Image by Author.

That is, we have two COO matrices:

  1. A normal “triple” COO with an implicit column index k
  2. A “qualifier” COO of the shape [3, num_qualifiers] where the first row contains indices of the columns in the “triple” COO, the second contains qualifier relations, and the third — qualifier entities. The index row connects a column of qualifiers to the main triple. That said, the columns in the “qualifier” COO that share the same index k belong to the same k-th triple in the “triple” COO matrix. This allows us to be O(q) in memory wrt to the number of qualifiers in the KG, and the overall memory is O(|edges|+|qualifiers|) ⚖.️

We Need to Talk about Datasets

We briefly touched upon encoding hyper-relational facts as a sequence of entities and relations. But are there already reliable datasets for experimenting on such KGs? Traditionally, KG embeddings were evaluated on the link prediction task, while Graph ML tasks include node classification, graph classification, entity matching, and many more.

So far there exist only two link prediction datasets: WikiPeople proposed by Guan et al — it’s a dump of Wikidata describing, well, people, and JF17K— an export from Freebase 🧟‍♂️. However, we identified major drawbacks 😞 with those two:

  • WikiPeople has too many qualifiers with literals (years). It is not advisable to treat literals as another sort of entities, as the numbers are continuous values and should be treated in this way (well, it is a general problem with literals in the KG embedding literature 🤔). That said, in most cases, such qualifiers are simply dropped. This results in a dataset where only 2% of facts have qualifiers, and 80% of those facts have only one qualifier pair :/
  • JF17K has test set leakages. In fact, the authors themselves identified ‘significantly many redundant triples’ and do not recommend using it in the experiments. Being originally more of an n-ary dataset, HINGE transformed it into a hyper-relational format with auxiliary predicates. We conducted a further study and found that more than 40% of the test statements share the same (s,r,o) main triple as in the train set. That is, in the subject/object prediction task, a simple triple-only heuristic can outperform 🏆 all previous hyper-relational approaches which we show in the paper.

As both two datasets are not that suitable for evaluating hyper-relational approaches, we sampled WD50K from Wikidata with the following guidelines:

Image by Author.
  • Retain Wikidata-like distribution of qualifiers. In the vanilla WD50K, about 13% of statements have qualifiers (close to 17% of total statements in Wikidata)
  • All qualifiers are entity-relation pairs, no literals
  • Entities and relations can be seen in main triples and qualifiers
  • 99% of statements have at most 6 qualifier pairs

For further experiments, we sampled 3 additional datasets:

  • WD50K (33) — about 33% of statements have qualifiers
  • WD50K (66) — about 66% of statements have qualifiers
  • WD50K (100) — all statements have qualifiers

Naturally, those datasets are smaller than the original WD50K with more qualifier-unique entities and relations.

StarE in Link Prediction

At this step, we finally have a StarE encoder and suitable link prediction datasets for the experiments. Our main research question:

Do qualifiers help to predict subjects and objects of hyper-relational facts?

StarE + decoder for link prediction. Image by Author.

That is, given subject, predicate, and all qualifiers we predict the object or vice versa for a subject. For that, we linearize a given fact into a sequence as shown in the illustration and use a 2-layer Transformer with avg pooling and a final FC layer as a decoder.

The Transformer also allows us to feed sequences of different lengths using padding tokens which are masked 👺 from the self-attention computations.

For comparison, we applied decoder-only HINGE and 2-layer Transformer on the same task to measure if the StarE encoder brings any benefits there.

Well, turns out it does! 🎆

Subject/Object Prediction Accuracy on various WD50K datasets. The more qualifiers — the bigger are gains! Image by Author.

What we observe 🕵 :

  • StarE drastically increases the link prediction performance compared to decoder-only approaches;
  • StarE is even more effective (the performance gap is larger) when there are more qualifiers in the dataset;
  • Hyper-relational approaches do help to better predict subjects and objects given qualifiers that triple-only baselines.

How many qualifiers do you need to see the quality improve? Just 2 are enough 😉

As few as 2 qualifiers give an observable performance boost. Image by Author.

Our experimental program with particular numbers and interactive charts is reported with Weights & Biases here 📈

So the takeaways 🔖 to the KG community:

  1. Seek to assign descriptive qualifiers to more triple facts in the graph — the more the better
  2. If you do assign qualifiers — add 2 or more!

Conclusions and Resources

  • Hyper-relational graphs are closer to reality and describe facts in more detail than plain triples
  • RDF* and LPG provide means to build hyper-relational KGs
  • A hyper-relational graph is different from a hypergraph
  • Hyper-relational KGs are already in use — both in open-domain KGs and industry
  • RDF* motivated StarE — a GNN encoder for hyper-relational KGs that can be paired with a decoder for downstream tasks
  • StarE improves link prediction compared to decoder-only approaches
  • The WD50K datasets family better captures the challenges of link prediction on hyper-relational KGs

You can find more information in:

Our EMNLP’20 paper: Message Passing for Hyper-Relational Knowledge Graphs (arxiv pre-print)
Code: Github Repo
Dataset: Zenodo page
Weights & Biases Report: here

--

--

AI Research Scientist @ Intel Labs. Working on Graph ML, Geometric DL, and Knowledge Graphs