Atwork credit: manwithnoname.space

Unsolved Machine Learning Problems That You Can Solve

Machine Learning for Knowledge Graphs is an incomplete and exciting field

James Fletcher
Published in
9 min readJul 9, 2019

--

TypeDB lets us create Knowledge Graphs from our data. But what challenges do we encounter where querying alone won’t cut it? What library can address these challenges?

Knowledge Graph Tasks

Below is a set of tasks to be conducted over Knowledge Graphs (KGs) that we have identified from real TypeDB use cases. The objective of KGLIB is to implement a portfolio of solutions for these tasks for TypeDB Knowledge Graphs.

  • Relation Prediction (a.k.a. Link Prediction)
  • Attribute Prediction
  • Subgraph Prediction
  • Building Concept Embeddings
  • Rule Mining (a.k.a. Association Rule Learning)
  • Ontology Merging
  • Automated Knowledge Graph Creation
  • Expert Systems
  • Optimal Pattern Finding
  • System Design Automation and Configuration Automation
  • Fuzzy Pattern Matching
  • Querying and Responding via Natural Language

Many of these tasks are open research problems, thus far “unsolved” for the general case.

We describe these tasks in more detail below. Where a solution is readily available in KGLIB, it is listed against the relevant task(s).

We openly invite collaboration to solve these unsolved problems in machine learning! All contributions are welcome — code, issues, ideas, discussions, pointers to existing tools, and relevant datasets will all help this project evolve!

If you wish to discuss your ideas more conversationally, and to follow the development conversation, please join the Vaticle Discord, and join the #kglib channel. Alternatively, start a new topic on the Vaticle Discussion Forum.

All of the solutions in KGLIB require that you have migrated your data into a TypeDB or TypeDB Cluster instance. There is an official examples repo for how to go about this, and information available on migration in the TypeDB docs.

We identify the following categories of tasks that need to be performed over KGs: Knowledge Graph Completion, Decision-Making, and Soft Searching.

Knowledge Graph Completion

Here we term any task which creates new facts for the KG as Knowledge Graph Completion.

Relation Prediction (a.k.a. Link Prediction)

We often want to find new connections in our Knowledge Graphs. Often, we need to understand how two concepts are connected. This is the case of binary Relation prediction, which all existing literature concerns itself with. TypeDB is a Hypergraph, where Relations are Hyperedges. Therefore, in general, the Relations we may want to predict may be ternary (3-way) or even N-ary (N-way), which goes beyond the research we have seen in this domain.

When predicting Relations, there are several scenarios we may have. When predicting binary Relations between the members of one set and the members of another set, we may need to predict them as:

  • One-to-one
  • One-to-many
  • Many-to-many

Examples: The problem of predicting which disease(s) a patient has is a one-to-many problem. Whereas, predicting which drugs in the KG treat which diseases is a many-to-many problem.

We anticipate that solutions working well for the one-to-one case will also be applicable (at least to some extent) to the one-to-many case and cascade also to the many-to-many case.

In KGLIB Knowledge Graph Convolutional Networks(KGCNs) can help us with one-to-one binary Relation prediction. This requires extra implementation, for which two approaches are apparent:

  • Create two KGCNs, one for each of the two Roleplayers in the binary Relation. Extend the neural network to compare the embeddings of each Roleplayer, and classify the pairing according to whether a Relation should exist or not.
  • Feed Relations directly to a KGCN, and classify their existence. (KGCNs can accept Relations as the Things of interest just as well as Entities). To do this we also need to create hypothetical Relations, labelled as negative examples, and feed them to the KGCN alongside the positively labelled known Relations. Note that this extends well to ternary and N-ary Relations.

Notice also that recommender systems are one use case of one-to-many binary Relation prediction.

Attribute Prediction

We would like to predict one or more Attributes of a Thing, which may include also prediction of whether that Attribute should even be present at all.

In KGLIB Knowledge Graph Convolutional Networks (KGCNs) can be used to directly learn Attributes for any Thing. Attribute prediction is already fully supported.

Subgraph Prediction

We can extend N-ary Relation and Attribute prediction to include Entity prediction, and in fact connected graphs of Entities, Relations, and Attributes as entire subgraphs. It may be possible to determine that such a graph is missing from an existing partially complete Knowledge Graph.

Building Concept Embeddings

Embeddings of Things and/or Types are universally useful for performing other downstream machine learning or data science tasks. Their usefulness comes in storing the context of a Concept in the graph as a numerical vector. These vectors are easy to ingest into other ML pipelines. The benefit of building general-purpose embeddings is therefore to make use of them in multiple other pipelines. This reduces the expense of traversing the Knowledge Graph, since this task can be performed once and the output re-used more than once.

In KGLIB Knowledge Graph Convolutional Networks (KGCNs) can be used to build general-purpose embeddings. This requires additional functionality, since a generic loss function is required in order to train the model. At its simplest, this can be achieved by measuring the shortest distance across the KG between two Things.

Rule Mining (a.k.a. Association Rule Learning)

Known largely as Association Rule Learning in the literature, here we refer to Horn Clause Rule Mining. The objective is to search the Knowledge Graph for new TypeQL Rules that may be applicable in the form of:

mined-rule sub rule,
when {
[antecedent (the conditions)]
}, then {
[consequent (the conclusions)]
};

We deem Rule mining a form of inductive reasoning, as opposed to the deductive reasoning built in to TypeDB (the method by which the induced Rules are applied).

Rule mining is a very important field for KG completion. Finding and verifying rules can augment the existing knowledge at scale, since the Rule will be applied wherever the antecedent is found to be true.

We anticipate that the validity of these rules needs to be checked by hand, since once committed to the graph they are assumed to be correct and will be applied across the KG.

Ontology Merging

Merging ontologies is a relatively common problem. Most often, users wish to merge their own proprietary Knowledge Graph with a public Knowledge Graph, for example ConceptNet, Gene Ontology (GO), Disease Ontology (DO).

TypeDB’s highly flexible knowledge representation features means that this isn’t challenging at all if the two ontologies contain non-overlapping Entities, even if the Entities of the two KGs are interrelated.

The challenge here is to find a mapping between the structure of the two KGs. If Types or Things (Type instances) overlap between the two KGs, then they need to be merged.

This decomposes the problem to that of matching between the two KGs. TypeDB’s schema helps with this task, since we can use this to perform matching between the structures of the two KGs, and thereby find a mapping between them. Matching of data can be framed as either link prediction, or a comparison of graph embeddings.

Automated Knowledge Graph Creation

Often, TypeDB users want to build a KG (or a subgraph of a KG) from raw data sources. This could be CSVs, SQL databases, bodies of text or information crawled on the web.

Data with tabular structure is best migrated to TypeDB manually until this field is far more mature. This is a result of the fact that a human reader of structure can infer the meaning of the information, but with only a field name or column name to go on, automated processes will lack the context necessary to perform well. Combining data from many sources in this fashion is a core strength of TypeDB, and is not hard to achieve, as per the TypeDB docs.

Creating KGs from unstructured information such as text, however, is actually achievable for this task, thanks to the rich context in the data and the wide array of open-source NLP and NLU frameworks available to use. The output of these frameworks can be ingested into a KG with ease, and semi-autonomously used to build a KG of the domain described by the unstructured data. See this post for more details on how to go about this. For discussions on integrating NLP and NLU with TypeDB, check out the #nlp channel on the TypeDB Discord.

Decision-Making

Querying a complete knowledge graph may not be enough to inform complex of difficult decisions; we require methods specifically to help us find the right decision to make.

Expert Systems

Given a scenario, Expert Systems automatically make decisions or prompt the best course of action.

For many applications, Expert Systems can be built entirely with the features provided by TypeDB out-of-the-box. They make extensive use of deductive reasoning by utilising TypeDB’s Rules, where data describing the scenario is added to the KG programmatically with insert statements (made via one of the TypeDB Clients). match queries can then return the right course of action given the scenario.

In KGLIB we need to create examples of Expert Systems, and outline any best practices for building them. This includes recommendations for when symbolic logic alone doesn’t provide sufficient answers, and needs to be combined with ML.

Optimal Pattern Finding

In general, this problem is one of finding an optimal path between two Things in a KG. This generally means taking account of a cost of traversing edges and nodes. If the cost of all traversals are equal, then compute path (now deprecated) will already perform this task, which is one of TypeDB’s built-in distributed algorithms. Here, the subgraph that can be traversed can be constrained with the inkeyword.

Beyond compute path, this problem concerns optimisation and is not necessarily best solved via ML methods; although it may be a candidate for an approach based on Reinforcement Learning.

System Design Automation and Configuration Automation

Typically this problem arises in engineering problems, most often in system design, where many systems need to be constructed and done so optimally according to the task they must fulfil and the constraints upon them.

This should be wholly or partially solvable with TypeDB’s automated reasoning. This task is particularly exciting, since solving it brings us closer to automating engineering design processes.

In KGLIB we need to build examples of how to build such methodologies, along with best practice guidance.

Soft Searching

Fuzzy Pattern Matching

TypeQL is a highly expressive language that we can use to query TypeDB. Included in TypeQL is the ability to make ambiguous queries. In some cases however, we may want to retrieve a list of the best matches rather than an equally-weighted list of exact matches. This requires a solution that goes beyond TypeQL.

Querying and Responding via Natural Language

This problem reduces to converting between natural language and TypeQL. TypeQL is expressive and closely resembles spoken English. However, there is still a gap between natural language and TypeQL. Finding a bridge for this gap will allow non-technical users to ask questions of the KG in natural language, and receive answers in natural language. This has wide applications, with particular rising interest in building chatbots for the web. At present, the most favourable solution architecture is to use a readily available NLU component, and translate the intentions that this component identifies into TypeQL.

Summary

We see that the tasks that we want to perform over Knowledge Graphs are varied. Having at our disposal a toolbox of methods to perform these tasks is a very exciting prospect! This would be enabling across so many industries.

This is our motivation for KGLIB, where we will keep this list of tasks up-to-date alongside solutions to fulfil them. We would be delighted if you would help us to progress KGLIB. You can help in so many ways:

  • Starring the repo to let us know you like it ;)
  • Telling us which tasks are important for you so that we can prioritise them — tell us in the comments!
  • Telling us about tasks and problems that we have omitted
  • Getting involved with the code and contributing to the solutions that interest you!

Thanks for reading, see you there!

You can reach out to me directly by email: james[at]vaticle.com, twitter @jmsfltchr, or on linkedin

--

--

James Fletcher

Principal Scientist at Vaticle. Researching the intelligent systems of the future, enabled by Knowledge Graphs.