Drug Repurposing with a Grakn Knowledge Graph for Bioinformatics

Tomas Sabat
Oct 22, 2018 · 5 min read

Drug repositioning (also known as drug repurposing) is the application of existing drugs and compounds to treat new disease indications — an attractive field due to the substantial cost and time requirements of new drug development (by some estimates, it can take over a decade to get a new drug approved). Having an existing drug repurposed can, therefore, lead to fast clinical impact and as such new business development opportunities for biotechnology enterprises.

In response to this opportunity, various computational methods have been developed, many of which integrate the large volumes of public biomedical datasets on chemical compounds, drugs, protein sequences, etc, in order to generate new insights.

That’s why in this article, I want to look at drug repositioning, and explore how the biomedical knowledge graph I created in an earlier article (link) could be applied to this field.

In that knowledge graph, I integrated nine publicly available biomedical databases and explored how it could be used in a biomedical workflow. For those who are not familiar with knowledge graphs, there are several reasons why using a knowledge graph as part of larger predictive systems for hypothesis generation in bioinformatics can be very useful:

  1. We can predict new relationships between biological components that are not explicitly stored in the knowledge graph.
  2. We can quickly and easily ingest and integrate new biomedical data.
  3. We can bring biological context to newly generated insight from, for example, high throughput systems or sequencing algorithms.

With that in mind, for this article, I wanted to see what insights for drug repositioning I could obtain from this knowledge graph.

The drug I wanted to look at for repositioning was Ilomastat, which is known to alleviate lung inflammation and fibrosis (source). Leveraging the current knowledge encoded in my knowledge graph, I wrote a Graql query (see below) to search for diseases associated with that drug. Remember (see original article): the original data does not include any drug-disease associations, so Grakn will use its reasoning engine to answer this query.

match 
$dr isa drug, has drug-name “ILOMASTAT”;
$di isa disease;
($dr, $di) isa drug-disease-association; get;

As you can see above, two candidate diseases have been returned: Rheumatoid Arthritis and Contact Dermatitis. As mentioned, the relationships between drug and diseases were created/reasoned by Grakn — they don’t explicitly exist in my data set. In order to understand why this answer was returned, we can double click on each relationship, and the graph will expand, eventually showing us the underlying data. This is how that looks like:

Now that the graph has expanded, we can see that a gene can:

  • be targeted by a drug,
  • be associated to a disease, and
  • encode proteins with sequence similarities.

Because of the relatively complex traversal in order to infer the drug-disease relationship, in this example, Grakn uses three different chained rules. I have listed the logic and code of these rules below to illustrate this.

Rule 1: Protein-disease associations
If we discover a gene that encodes a certain protein, and that gene is also associated with a disease, then we can consider that protein and disease to be associated. This is the logic that the rule below tries to capture. In other words, if Grakn finds this set of conditions in the underlying dataset to be true, then it will create a new relationship — a protein-disease association.

# Rule 1: Infers protein-disease associations 
when-gene-associated-disease-then-protein-disease-association sub rule,
when {
$g isa gene;
$pr isa protein;
$di isa disease;
(associated-disease: $di, associated-gene: $g) isa gene-disease-association;
(encoding-gene: $g, encoded-protein: $pr) isa gene-protein-encoding;
} then {
(associated-protein: $pr, associated-disease: $di) isa protein-disease-association;
};

Rule 2: Protein-drug associations
Similarly, we also want to create a relationship between a protein and a drug, when we discover a gene which encodes a protein and is also associated to a drug. This logic can be expressed as a rule in Grakn:

# Rule 2: Creates protein-drug interactions
when-gene-interacts-drug-then-protein-drug-interaction sub rule,
when {
$g isa gene;
$pr isa protein;
$dr isa drug;
($dr, target-gene: $g) isa drug-gene-interaction;
(encoding-gene: $g, encoded-protein: $pr) isa gene-protein-encoding;
} then {
(target-protein: $pr, interacted-drug: $dr) isa drug-protein-interaction;
};

Rule 3: Drug-disease associations
Finally, we also want to create associations between drugs and diseases, when two different proteins share a sequence similarity, and where one protein has been associated to a disease (inferred through by Rule 1), and the other to a drug (inferred through Rule 2). This logic is encoded in Grakn as a rule like so:

# Rule 3: Creates drug-disease associations
when-sequence-similarity-then-drug-disease-relationship sub rule,
when {
$di isa disease;
$pr isa protein;
$pr2 isa protein;
$pr != $pr2;
$dr isa drug;
(associated-disease: $di, associated-protein: $pr) isa protein-disease-association;
(similar-protein: $pr, similar-protein: $pr2) isa protein-similarity;
(target-protein: $pr2, interacted-drug: $dr) isa drug-protein-interaction;
} then {
(affected-disease: $di, therapeutic: $dr) isa drug-disease-association;
};

Final Remarks
I hope this article serves to show how a knowledge graph can help to accelerate the overall knowledge discovery process workflow by specifically leveraging Grakn’s reasoning engine for drug repositioning. The rules above can be expanded through the use of Machine Learning to find patterns and insert them as rules into Grakn. This will exponentially increase the insights we can obtain for drug repositioning.

Finally, I would also like to recommend the work (link) that Soroush Saffari has done, on how to use protein sequencing algorithms and insert this type of data into a knowledge graph, which will generate new insights for drug repositioning through the rules defined above.

If you have any questions, comments or would like to collaborate, please shoot me an email at tomas@grakn.ai or tweet me at @tasabat. You can also talk to us and discuss your ideas with the Grakn community.

Vaticle

Creators of TypeDB and TypeQL

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store