Drug Repurposing with a TypeDB Knowledge Graph for Bioinformatics

Tomas Sabat
Vaticle
Published in
5 min readOct 22, 2018

Drug repositioning (also known as drug repurposing) is the application of existing drugs and compounds to treat new disease indications — an attractive field due to the substantial cost and time requirements of new drug development (by some estimates, it can take over a decade to get a new drug approved). Having an existing drug repurposed can, therefore, lead to fast clinical impact and as such new business development opportunities for biotechnology enterprises.

In response to this opportunity, various computational methods have been developed, many of which integrate the large volumes of public biomedical datasets on chemical compounds, drugs, protein sequences, etc, in order to generate new insights.

That’s why in this article, I want to look at drug repositioning, and explore how the biomedical knowledge graph I created in an earlier article (link) could be applied to this field.

In that knowledge graph, I integrated nine publicly available biomedical databases and explored how it could be used in a biomedical workflow. For those who are not familiar with knowledge graphs, there are several reasons why using a knowledge graph as part of larger predictive systems for hypothesis generation in bioinformatics can be very useful:

  1. We can predict new relationships between biological components that are not explicitly stored in the knowledge graph.
  2. We can quickly and easily ingest and integrate new biomedical data.
  3. We can bring biological context to newly generated insight from, for example, high throughput systems or sequencing algorithms.

With that in mind, for this article, I wanted to see what insights for drug repositioning I could obtain from this knowledge graph.

The drug I wanted to look at for repositioning was Ilomastat, which is known to alleviate lung inflammation and fibrosis (source). Leveraging the current knowledge encoded in my knowledge graph, I wrote a TypeQL query (see below) to search for diseases associated with that drug. Remember (see original article): the original data does not include any drug-disease associations, so TypeDB will use its reasoning engine to answer this query.

match 
$dr isa drug, has drug-name “ILOMASTAT”;
$di isa disease;
($dr, $di) isa drug-disease-association; get;
The result of the query, visualised. The red node represents the drug ILOMASTAT, while yellow nodes represent diseases. Note: the visualisation is limited to just two results.

As you can see above, two candidate diseases have been returned: Rheumatoid Arthritis and Contact Dermatitis. As mentioned, the relationships between drug and diseases were created/reasoned by TypeDB — they don’t explicitly exist in my data set. In order to understand why this answer was returned, we can double click on each relationship, and the graph will expand, eventually showing us the underlying data. This is how that looks like:

After double clicking on both relationships the following appears: green nodes are proteins, and blue nodes are genes.

Now that the graph has expanded, we can see that a gene can:

  • be targeted by a drug,
  • be associated to a disease, and
  • encode proteins with sequence similarities.

Because of the relatively complex traversal in order to infer the drug-disease relationship, in this example, TypeDB uses three different chained rules. I have listed the logic and code of these rules below to illustrate this.

Rule 1: Protein-disease associations
If we discover a gene that encodes a certain protein, and that gene is also associated with a disease, then we can consider that protein and disease to be associated. This is the logic that the rule below tries to capture. In other words, if TypeDB finds this set of conditions in the underlying dataset to be true, then it will create a new relationship — a protein-disease association.

# Rule 1: Infers protein-disease associations 
rule when-gene-associated-disease-then-protein-disease-association:
when {
$g isa gene;
$pr isa protein;
$di isa disease;
(associated-disease: $di, associated-gene: $g) isa gene-disease-association;
(encoding-gene: $g, encoded-protein: $pr) isa gene-protein-encoding;
} then {
(associated-protein: $pr, associated-disease: $di) isa protein-disease-association;
};

Rule 2: Protein-drug associations
Similarly, we also want to create a relationship between a protein and a drug, when we discover a gene which encodes a protein and is also associated to a drug. This logic can be expressed as a rule in TypeDB:

# Rule 2: Creates protein-drug interactions
rule when-gene-interacts-drug-then-protein-drug-interaction:
when {
$g isa gene;
$pr isa protein;
$dr isa drug;
($dr, target-gene: $g) isa drug-gene-interaction;
(encoding-gene: $g, encoded-protein: $pr) isa gene-protein-encoding;
} then {
(target-protein: $pr, interacted-drug: $dr) isa drug-protein-interaction;
};

Rule 3: Drug-disease associations
Finally, we also want to create associations between drugs and diseases, when two different proteins share a sequence similarity, and where one protein has been associated to a disease (inferred through by Rule 1), and the other to a drug (inferred through Rule 2). This logic is encoded in TypeDB as a rule like so:

# Rule 3: Creates drug-disease associations
rule when-sequence-similarity-then-drug-disease-relationship:
when {
$di isa disease;
$pr isa protein;
$pr2 isa protein;
$pr != $pr2;
$dr isa drug;
(associated-disease: $di, associated-protein: $pr) isa protein-disease-association;
(similar-protein: $pr, similar-protein: $pr2) isa protein-similarity;
(target-protein: $pr2, interacted-drug: $dr) isa drug-protein-interaction;
} then {
(affected-disease: $di, therapeutic: $dr) isa drug-disease-association;
};

Final Remarks
I hope this article serves to show how a knowledge graph can help to accelerate the overall knowledge discovery process workflow by specifically leveraging TypeDB’s reasoning engine for drug repositioning. The rules above can be expanded through the use of Machine Learning to find patterns and insert them as rules into TypeDB. This will exponentially increase the insights we can obtain for drug repositioning.

Finally, I would also like to recommend the work (link) that Soroush Saffari has done, on how to use protein sequencing algorithms and insert this type of data into a knowledge graph, which will generate new insights for drug repositioning through the rules defined above.

If you have any questions, comments or would like to collaborate, please shoot me an email at tomas@vaticle.com or tweet me at @tasabat. You can also talk to us and discuss your ideas with the TypeDB community.

--

--