Atwork credit:

Unsolved Machine Learning Problems That You Can Solve

Machine Learning for Knowledge Graphs is an incomplete and exciting field

James Fletcher
Jul 9, 2019 · 9 min read

Grakn lets us create Knowledge Graphs from our data. But what challenges do we encounter where querying alone won’t cut it? What library can address these challenges?

Knowledge Graph Tasks

  • Relation Prediction (a.k.a. Link Prediction)
  • Attribute Prediction
  • Subgraph Prediction
  • Building Concept Embeddings
  • Rule Mining (a.k.a. Association Rule Learning)
  • Ontology Merging
  • Automated Knowledge Graph Creation
  • Expert Systems
  • Optimal Pattern Finding
  • System Design Automation and Configuration Automation
  • Fuzzy Pattern Matching
  • Querying and Responding via Natural Language

Many of these tasks are open research problems, thus far “unsolved” for the general case.

We describe these tasks in more detail below. Where a solution is readily available in KGLIB, it is listed against the relevant task(s).

We openly invite collaboration to solve these problems! All contributions are welcome — code, issues, ideas, discussions, pointers to existing tools, and relevant datasets will all help this project evolve!

If you wish to discuss your ideas more conversationally, and to follow the development conversation, please join the Grakn Slack, and join the #kglib channel. Alternatively, start a new topic on the Grakn Discussion Forum.

All of the solutions in KGLIB require that you have migrated your data into a Grakn Core or Grakn KGMS instance. There is an official examples repo for how to go about this, and information available on migration in the Grakn docs.

We identify the following categories of tasks that need to be performed over KGs: Knowledge Graph Completion, Decision-Making, and Soft Searching.

Knowledge Graph Completion

Relation Prediction (a.k.a. Link Prediction)

When predicting Relations, there are several scenarios we may have. When predicting binary Relations between the members of one set and the members of another set, we may need to predict them as:

  • One-to-one
  • One-to-many
  • Many-to-many

Examples: The problem of predicting which disease(s) a patient has is a one-to-many problem. Whereas, predicting which drugs in the KG treat which diseases is a many-to-many problem.

We anticipate that solutions working well for the one-to-one case will also be applicable (at least to some extent) to the one-to-many case and cascade also to the many-to-many case.

In KGLIB Knowledge Graph Convolutional Networks(KGCNs) can help us with one-to-one binary Relation prediction. This requires extra implementation, for which two approaches are apparent:

  • Create two KGCNs, one for each of the two Roleplayers in the binary Relation. Extend the neural network to compare the embeddings of each Roleplayer, and classify the pairing according to whether a Relation should exist or not.
  • Feed Relations directly to a KGCN, and classify their existence. (KGCNs can accept Relations as the Things of interest just as well as Entities). To do this we also need to create hypothetical Relations, labelled as negative examples, and feed them to the KGCN alongside the positively labelled known Relations. Note that this extends well to ternary and N-ary Relations.

Notice also that recommender systems are one use case of one-to-many binary Relation prediction.

Attribute Prediction

In KGLIB Knowledge Graph Convolutional Networks (KGCNs) can be used to directly learn Attributes for any Thing. Attribute prediction is already fully supported.

Subgraph Prediction

Building Concept Embeddings

In KGLIB Knowledge Graph Convolutional Networks (KGCNs) can be used to build general-purpose embeddings. This requires additional functionality, since a generic loss function is required in order to train the model. At its simplest, this can be achieved by measuring the shortest distance across the KG between two Things. This can be achieved trivially in Grakn using compute path.

Rule Mining (a.k.a. Association Rule Learning)

mined-rule sub rule,
when {
[antecedent (the conditions)]
}, then {
[consequent (the conclusions)]

We deem Rule mining a form of inductive reasoning, as opposed to the deductive reasoning built in to Grakn (the method by which the induced Rules are applied).

Rule mining is a very important field for KG completion. Finding and verifying rules can augment the existing knowledge at scale, since the Rule will be applied wherever the antecedent is found to be true.

We anticipate that the validity of these rules needs to be checked by hand, since once committed to the graph they are assumed to be correct and will be applied across the KG.

Ontology Merging

Grakn’s highly flexible knowledge representation features means that this isn’t challenging at all if the two ontologies contain non-overlapping Entities, even if the Entities of the two KGs are interrelated.

The challenge here is to find a mapping between the structure of the two KGs. If Types or Things (Type instances) overlap between the two KGs, then they need to be merged.

This decomposes the problem to that of matching between the two KGs. Grakn’s schema helps with this task, since we can use this to perform matching between the structures of the two KGs, and thereby find a mapping between them. Matching of data can be framed as either link prediction, or a comparison of graph embeddings.

Automated Knowledge Graph Creation

Data with tabular structure is best migrated to Grakn manually until this field is far more mature. This is a result of the fact that a human reader of structure can infer the meaning of the information, but with only a field name or column name to go on, automated processes will lack the context necessary to perform well. Combining data from many sources in this fashion is a core strength of Grakn, and is not hard to achieve, as per the Grakn docs.

Creating KGs from unstructured information such as text, however, is actually achievable for this task, thanks to the rich context in the data and the wide array of open-source NLP and NLU frameworks available to use. The output of these frameworks can be ingested into a KG with ease, and semi-autonomously used to build a KG of the domain described by the unstructured data. See this post for more details on how to go about this. For discussions on integrating NLP and NLU with Grakn, check out the #nlp channel on the Grakn Slack.


Expert Systems

For many applications, Expert Systems can be built entirely with the features provided by Grakn out-of-the-box. They make extensive use of deductive reasoning by utilising Grakn’s Rules, where data describing the scenario is added to the KG programmatically with insert statements (made via one of the Grakn Clients). match queries can then return the right course of action given the scenario.

In KGLIB we need to create examples of Expert Systems, and outline any best practices for building them. This includes recommendations for when symbolic logic alone doesn’t provide sufficient answers, and needs to be combined with ML.

Optimal Pattern Finding

Beyond compute path, this problem concerns optimisation and is not necessarily best solved via ML methods; although it may be a candidate for an approach based on Reinforcement Learning.

System Design Automation and Configuration Automation

This should be wholly or partially solvable with Grakn’s automated reasoning. This task is particularly exciting, since solving it brings us closer to automating engineering design processes.

In KGLIB we need to build examples of how to build such methodologies, along with best practice guidance.

Soft Searching

Fuzzy Pattern Matching

Querying and Responding via Natural Language


This is our motivation for KGLIB, where we will keep this list of tasks up-to-date alongside solutions to fulfil them. We would be delighted if you would help us to progress KGLIB. You can help in so many ways:

  • Starring the repo to let us know you like it ;)
  • Telling us which tasks are important for you so that we can prioritise them — tell us in the comments!
  • Telling us about tasks and problems that we have omitted
  • Getting involved with the code and contributing to the solutions that interest you!

Thanks for reading, see you there!

You can reach out to me directly by email: james[at], twitter @jmsfltchr, or on linkedin


Creators of TypeDB and TypeQL