Introducing LINK: the Open Targets Literature Knowledge Graph

Andrea Pierleoni
opentargets
Published in
5 min readJan 18, 2018

Explore more than 500 million connections

Today we release a new tool called LINK (LIterature coNcept Knowledgebase) that allows the exploration of half a billion relations between genes, diseases, drugs and key concepts extracted from PubMed abstracts using NLP (Natural Language Processing).

Once MEDLINE relaxed their license for obtaining and analysing publication data last year, we started looking for novel ways to mine their data. We wanted to exploit the biomedical knowledge often buried in the literature to help scientists generate new hypotheses for the identification of new drug targets.

For this purpose, we have built Library, an open source ecosystem comprising:

  • a pipeline that allows us to quickly run a large scale NLP analysis [github]
  • an API that serves the resulting data [github]
  • a user interface to explore this data [github]

Our pipeline annotates genes, diseases and drugs present in PubMed abstracts, and extracts key concepts. If you want more details on it, stay tuned for our following posts.

We use the same framework to extract semantic relations between these entities in the form of subject-predicate-object. Running this analysis on the current PubMed release generates over half a billion (!!) of these connections.

Taken together the relationships between genes, disease and drugs form a comprehensive graph of biomedical knowledge.

What we release today is a tool to explore the knowledge graph we built, and to allow our users to get to the tiny bit of detail of every sentence we found a relation for.

Getting started

You can try LINK now at https://link.opentargets.io. Type in the entity or concept you are interested in, select it from the dropdown list and press GO.

You can also search for more than one entity at the same time, or any free text really.

You will get a graph of entities related to your queries, so that you can start to explore your hypotheses.

Get an overview of a topic

There are many different ways to use LINK. The most immediate is to start by querying with a broad topic, For instance this could well be a known gene related to breast cancer, for example BRCA1.

In the graph above you can see the genes, diseases, drugs and key concepts related to your query. By default the size of the nodes in the graph is linked to the importance of the node (we use the pagerank score) and the width of the edge is linked to the amount of underlying information available.

The entities are also listed for convenience below the graph, so that it is easier to read all the labels and select the ones you want to explore more in depth.

By selecting a node you can see more details about that entity, while entities that are not directly connected to it will disappear from the graph, and will be greyed out in the list. If an other node is selected the shortest path connecting them will be highlighted.

To validate that connection you can then drill down to the detail of each entity connection by clicking on the graph edges or the red arrows in the selected path.
This will display all the sentences that we have annotated with a semantic relation between the two nodes selected.

Advanced Search

We made the design decision to present by default only the most significant entities for each query. However LINK allows you to customise the view in many ways allowing you to increase the number of entities displayed, show just some of them, or change the scoring and the significance methods.

You can for example increase the number of nodes, and restrict the entity types to just show genes

The output of this query would be a graph of gene to gene relations, that will give you an overview of the genes involved and can be further explored in detail to support or generate hypotheses.

An other way that we found LINK was very useful was to start from a very detailed point, like a publication that we liked, or a SNP id, and understand the general landscape for that topic.
For example you could start from a paper that you just read, like we did for this one that discusses the role of the gene ADAM10 in brain diseases like Alzheimer’s disease

If you paste the PubMed id of the paper into the search box and press GO, LINK will start with the relations found in that paper and extend the search to those related to them.
By focusing just on genes and diseases (with the advanced search options) we can get a graph like the one below, describing the genes and diseases linked to ADAM10

By clicking on the Alzheimer’s disease node, it is easy to understand that the ADAM10 involvement in Brain diseases is just part of its role in the disease space.

Looking at the full graph you might easily spot that ADAM10 has for example a role in melanoma as well.

Want to get into more details about it? Follow the link to the Open Targets platform to read everything we know about this associations.

LINK is released as a proof of concept, we are very interested on getting any feedback on it, so please let us know what you think in the comments below or just send us an email.

Originally published at blog.opentargets.org on January 18, 2018.

--

--