Enable Intelligent Query with Biological NLP and Knowledge Graphs

Nicholas Morley
Nov 20, 2018 · 5 min read

Biology is one domain in which a huge amount of information is encoded in written form. We demonstrate the automatic construction of a knowledge graph from scientific text, in order to enable us to query scientific content in a manner driven by an understanding of the biological relationships conveyed therein. This article provides a minimal technical demonstration of a model which, applied at scale, offers a way to increase the speed and comprehensibility with which we can bring existing information to bear on biological questions. We use the Luscient API to extract mechanistic relationships from biomedical text, and Grakn to store and reveal the connections that emerge from these relationships.

You can reproduce the steps in this article with instructions and code from the accompanying repository.

Text to Graph

Mechanistic Information from Text

We will take the following three sentences for this example (truncated in parts for simplicity):

You can use a text substrate of your choice by following the instructions here.

Parts and Interactions

Functional Relationships

Implementing the Graph

Grakn Schema

Where:

  • ‘driven-concept’ represents the upward or downward change in the ‘drive’ of a biological concept. The name of the biological concept and the direction of the drive change (i.e., ‘UP’ or ‘DOWN’) are given by the attributes ‘name’ and ‘valence’, respectively.
  • ‘triggering-relationship’ represents a functional relationship between two such drive change events, where a manipulation in one biological entity or concept triggers another. The provenance of each relationship is given by the attributes: ‘source-text’ (the text from which it was extracted), ‘source-name’, (e.g., PubMed Central), and ‘source-id’ (e.g, PMC3174648).

Loading the Information

(The script used to insert the data is found here.)

Let’s inspect one of our relationships — the relationship between B. fragilis and spermine oxidase for instance. (From here on we’ll use Grakn’s query language, Graql, to interact with the graph.)

Query

Answer

Inferring New Relationships

  • B. fragilis toxin induces spermine oxidase.
  • Spermine oxidase itself leads to several effects (i.e., ↑ reactive oxygen species, ↑ DNA damage, ↑ cancer).

We want our system to be able to tell us therefore that increasing drive to B. fragilis toxin might by extension also trigger these effects.

To allow Grakn to carry out this form of deduction, we define the rule:

Our system will now be able to fill in the gaps. For example, it can draw the line between B. fragilis and DNA damage — a relationship that does not explicitly appear in the underlying data:

Asking Questions

  • What might be the consequences of increasing or decreasing the drive of X?
  • What sequences of changes could bring about a given outcome Y?
  • What sets of observations are consistent with, or might ‘explain’ observation Z?

Consequences of a Change

Query

Answer

Paths to an Outcome

Query

Answer

Explain an Observation

Query

Answer

Conclusion

This is one example of how we can move beyond the current standard of keyword-based search towards a deeper type of search driven by an understanding of biological parts and interactions. We believe this shift will improve our ability to realise connections and see further in our quest to understand biology.

Author email: nick.morley111@gmail.com

Author LinkedIn: https://www.linkedin.com/in/nick-morley-32181110b

Project website: www.luscient.io

Project email: luscient.tech@gmail.com

Thanks to Beni Bienz and Tim Daly for their insightful contributions, Marco Scoppetta and Soroush Saffari for their brilliance, Tomas Sabat for his initiative and support, and to Paul Bradley and Gordon Baxter, who inspired this work.

Vaticle

Creators of TypeDB and TypeQL