Open Targets API Tutorial: Getting Started

eliseo papa
opentargets
Published in
10 min readApr 18, 2016

Since release 1.1, the Open Targets Platform, previously known as Target Validation Platform, exposes a public REST API to allow programmatic retrieval of data served at targetvalidation.org. This is the same API that powers our website and gives full access to the data we use to build our Open Targets Platform user interface.

The available methods are divided in to:

  • public — Methods that serve the core set of our data and that we will keep stable and support.
  • private — Methods used by the web app to serve additional data. These methods change often and thus should not be relied upon.
  • utils — Methods to get statistics and technical data about the API.

Each of the methods is described in detail in the API documentation, where each API call can be tested using an interactive interface powered by swagger-ui.

Before diving into the tutorial, it is worth touching briefly on which tools are available to start using our API. If you are familiar with REST APIs and HTTP calls you can probably skip ahead to the next section.

Pasting a query URL such as:

https://api.opentargets.io/v3/platform/public/utils/stats

in an internet browser will return statistics on o data. The default format is JSON.

For more complex queries, the browser becomes impractical but you can query our API on the command line, Python client or in your own workflow.

On the command line, the classic option is to use the curl command. A typical curl call would be:

curl -X GET <query url>

So to check the version of the latest available API, the call would be:

curl -X GET 
https://api.opentargets.io/v3/platform/public/utils/version

which returns:

3

To get some statistics, the call would be:

curl -X GET 
https://api.opentargets.io/v3/platform/public/utils/stats

which returns the following JSON data:

{"associations": {"datatypes": {"literature": {"total": 1031205, "datasources": {"europepmc": {"total": 1031205}}}, "rna_expression": {"total": 711158, "datasources": {"expression_atlas": {"total": 711158}}}, "genetic_association": {"total": 107993, "datasources": {"eva": {"total": 27356}, "gwas_catalog": {"total": 58851}, "uniprot": {"total": 12739}, "uniprot_literature": {"total": 45185}}}, "somatic_mutation": {"total": 20019, "datasources": {"cancer_gene_census": {"total": 19488}, "eva_somatic": {"total": 990}}}, "known_drug": {"total": 51316, "datasources": {"chembl": {"total": 51316}}}, "animal_model": {"total": 613231, "datasources": {"phenodigm": {"total": 613231}}}, "affected_pathway": {"total": 3198, "datasources": {"reactome": {"total": 3198}}}}, "total": 2175851}, "evidencestrings": {"datatypes": {"literature": {"total": 3166744, "datasources": {"europepmc": {"total": 3166744}}}, "rna_expression": {"total": 433809, "datasources": {"expression_atlas": {"total": 433809}}}, "genetic_association": {"total": 55964, "datasources": {"eva": {"total": 19857}, "gwas_catalog": {"total": 25536}, "uniprot": {"total": 4786}, "uniprot_literature": {"total": 5785}}}, "somatic_mutation": {"total": 10247, "datasources": {"cancer_gene_census": {"total": 9790}, "eva_somatic": {"total": 457}}}, "known_drug": {"total": 29812, "datasources": {"chembl": {"total": 29812}}}, "animal_model": {"total": 395407, "datasources": {"phenodigm": {"total": 395407}}}, "affected_pathway": {"total": 9468, "datasources": {"reactome": {"total": 9468}}}}, "total": 4101451}, "targets": {"total": 24716}, "diseases": {"total": 8051}}

A more user-friendly alternative is to use the httpie tool. It allows a simpler syntax to query the API methods, and it formats the response to improve the readability.

An easy way to construct more complex queries is to head to the interactive interface of our Open Targets API documentation, which allows you to input parameters for each method and visualise a nicely formatted response, as well as the URL for each call.

Another very popular option to get data programmatically from the API is python, leveraging the great requests library. You can build a GET request and for each response read the status code, headers and content, both as strings or serialised into a JSON object:

>>> import requests >>> r = requests.get('https://api.opentargets.io/v3/platform/public/utils/stats') >>> r.status_code 200 >>> r.headers['content-type'] 'application/json' >>> r.text u'{"associations": {"datatypes": { ...' >>> r.json() {"associations": {"datatypes": {"literature": {"total": 1029501, "datasources": {"europepmc": {"total": 1029501}}}, "rna_expression": {"total": 711158 ... }

Finding the identifier (ID) of a target or a disease

As described in the About the Open Targets Platform page, we bring together evidence to associate potential drug targets with diseases.

To access target information through the API, it is necessary to use the Ensembl gene ID for the corresponding gene (e.g. the Ensembl gene identifier for NOD2 is ENSG00000167207).

We map diseases to terms in the Experimental Factor Ontology (EFO). Each disease is linked to an EFO ID (e.g. the EFO ID for “inflammatory bowel disease” is EFO_0003767).

To make the mapping of IDs to the desired target easier, we can use the /public/search method of the API. This method replicates the search box on the the home page of the Open Targets Platform. Using this method, you can search for a gene or protein using their symbol, common name or any synonym. The response will contain the ID, together with summary data about what is known about the target.

If we search for the NOD2 gene and we limit our attention to the first result setting the size parameter to 1, we get:

http https://api.opentargets.io/v3/platform/public/search q==NOD2 size==1 filter==targetHTTP/1.1 200 OK Content-Type: application/json { "data": [ { "data": { "approved_name": "nucleotide binding oligomerization domain containing 2", "approved_symbol": "NOD2", "association_counts": { "direct": 221, "total": 380 }, "biotype": "protein_coding", "description": "Involved in ...", "ensembl_gene_id": "ENSG00000167207", ... "name_synonyms": [ "Inflammatory bowel disease protein 1", "nucleotide-binding oligomerization domain, ... ], "symbol_synonyms": [ "BLAU", "CD", ... ], "top_associations": { "direct": [ { "id": "ENSG00000167207-EFO_0000701", "score": 1.0 }, ... ], "total": [ { "id": "ENSG00000167207-EFO_0000540", "score": 1.0 }, ... ] }, "type": "target", "uniprot_accessions": [ "Q9HC29",... ] }, ... }

A tool such as the jq command can be useful to parse the resulting JSON responses on the command line. You can isolate specific fields by typing a . followed by the field name you want to filter from your JSON.

Thus the same query can be piped to jq to obtain the Ensembl gene ID:

http https://www.targetvalidation.org/api/latest/public/search q==NOD2 size==1 filter==target | jq '.data[] | .id'"ENSG00000167207"

There are other options to find the Ensembl gene identifier of a target such as to Map human gene symbols to ensembl gene IDs, using the Ensembl REST API, curl and jq, but we recommend looking up IDs using our own API to ensure consistency with successive queries.

You can obtain the same result in python by parsing the JSON as a dictionary and finding the correct index:

>>> import requests >>> from pprint import pprint >>> r = requests.get('https://api.opentargets.io/v3/platform/public/search', params={"q":"NOD2","size":1}) >>> pprint(r.json()) {'data': [{'data': {'approved_name': 'nucleotide binding oligomerization ' 'domain containing 2', 'approved_symbol': 'NOD2', 'association_counts': {'direct': 221, 'total': 380}, 'biotype': 'protein_coding', 'description': 'Involved in ...', 'ensembl_gene_id': 'ENSG00000167207', ...

which returns the expected JSON object. To select specific keys one needs to traverse the resulting dictionary.

>>> r.json()['data'][0]['id'] 'ENSG00000167207'

The process to find disease ids is very similar, although for less common diseases or those have many synonyms it is advisable to return more than one result at a time and then pick the most appropriate EFO ID.

>>> r = requests.get('https://api.opentargets.io/v3/platform/public/search', params={"q":"inflammatory bowel disease","size":1}) >>> r.json()['data'][0]['id'] 'EFO_0003767'

Just as above for targets, it is possible to find EFO IDs by querying the ontology directly through the Ontology Lookup Service API but we recommend using our API directly, since the EFO version can at times not be in sync with the one we use in targetvalidation.org

Finding associations between target and disease

With an id for a target and a disease, it is possible to query the API for the presence of any associations linking the two using the public/association/filter method.
Continuing with the inflammatory bowel disease (IBD) example:

http https://api.opentargets.io/v3/platform/public/association/filter target==ENSG00000167207 disease==EFO_0003767

returns the association JSON object summarising the data present in targetvalidation.org that links NOD2 to IBD

{ "data": [ { "association_score": { "datasources": { "cancer_gene_census": 0.0, "chembl": 0.0, "disgenet": 0.0, "europepmc": 0.3746566730517112, "eva": 0.0, "eva_somatic": 0.0, "expression_atlas": 0.024180000000000004, "gwas_catalog": 0.7180094552025357, "phenodigm": 0.17046, "reactome": 0.0, "uniprot": 0.0, "uniprot_literature": 1.0 }, "datatypes": { "affected_pathway": 0.0, "animal_model": 0.17046, "genetic_association": 1.0, "known_drug": 0.0, "literature": 0.3746566730517112, "rna_expression": 0.024180000000000004, "somatic_mutation": 0.0 }, "overall": 1.0 }, "disease": { "efo_info": { "label": "inflammatory bowel disease", "path": [ [ "EFO_0000405", "EFO_0003767" ], [ "EFO_0000540", "EFO_0005140", "EFO_0003767" ] ], "therapeutic_area": { "codes": [ "EFO_0000405", "EFO_0000540" ], "labels": [ "immune system disease", "digestive system disease" ] } }, "id": "EFO_0003767" }, "evidence_count": { "datasources": { "cancer_gene_census": 0.0, "chembl": 0.0, "disgenet": 0.0, "europepmc": 1028.0, "eva": 0.0, "eva_somatic": 0.0, "expression_atlas": 1.0, "gwas_catalog": 13.0, "phenodigm": 3.0, "reactome": 0.0, "uniprot": 0.0, "uniprot_literature": 2.0 }, "datatypes": { "affected_pathway": 0.0, "animal_model": 3.0, "genetic_association": 15.0, "known_drug": 0.0, "literature": 1028.0, "rna_expression": 1.0, "somatic_mutation": 0.0 }, "total": 1047.0 }, "id": "ENSG00000167207-EFO_0003767", "is_direct": true, "target": { "gene_info": { "name": "nucleotide binding oligomerization domain containing 2", "symbol": "NOD2" }, "id": "ENSG00000167207" } } ], "facets": {}, "from": 0, "size": 1, "therapeutic_areas": [], "took": 21, "total": 1 }

The content of the association object which is returned provides a good illustration of the underlying structure of the data in the Open Targets Platform. The JSON response includes a data array, whose content is divided in to:

  • Association score: for each target we compute an association score indicating the strength of the available evidence connecting target to disease. We can use the score to rank target-to-disease links with respect to each other, as we can see when looking at inflammatory bowel disease on targetvalidation.org.

The scoring is explained in detail elsewhere, but is worth summarising here to better interpret the JSON response.

For each data source we compute a score based on the evidence linking target to disease. Similar data sources are then grouped into datatypes and an association score per datatype is computed using an harmonic sum. The overall association score for a target and a disease is calculated as the sum of the harmonic series of the individual datatype scores adjusting the contribution of each data type using a heuristic weighting.

  • Disease which contains the EFO ID and EFO information about the disease, including its position in the ontology hierarchy and a list of therapeutic areas.
  • Evidence count the association object does not include all evidence, but only a summary of how many pieces of evidences where found for each category. Below we will look how the API can be used to get further details on each evidence item.
  • id a <ensembl target ID>-<disease EFO id> unique identifier for the association
  • is_direct which is a true/false variable indicating if the connection between target and disease is directly observed or inferred from the relationship the current disease has with some other disease in the ontology.
  • target which contains information about the gene or protein, as well as its id
{ "association_score": { "datasources": {...}, "datatypes": {...}, "overall": 1.0 }, "disease": { "efo_info": { ... }, "id": "EFO_0003767" }, "evidence_count": { "datasources": { ... }, "datatypes": { ... }, "total": 1047.0 }, "id": "ENSG00000167207-EFO_0003767", "is_direct": true, "target": { "gene_info": { ... }, "id": "ENSG00000167207" } }

It is also possible to query the public/association/filter endpoint with just a target or a disease identifier, and choose to have just some part of the response returned. So to get the 3 diseases most strongly associated with the BRAF target:

http https://api.opentargets.io/v3/platform/public/association/filter target==ENSG00000157764 size==3 field==disease{ "data": [ { "disease": { "efo_info": { "efo_id": "http://www.ebi.ac.uk/efo/EFO_0000616", "label": "neoplasm", "path": [ [ "EFO_0000616" ] ], "therapeutic_area": { "codes": [], "labels": [] } }, "id": "EFO_0000616" } }, { "disease": { "efo_info": { "efo_id": "http://www.ebi.ac.uk/efo/EFO_0000311", "label": "cancer", "path": [ [ "EFO_0000616", "EFO_0000311" ] ], "therapeutic_area": { "codes": [ "EFO_0000616" ], "labels": [ "neoplasm" ] } }, "id": "EFO_0000311" } }, { "disease": { "efo_info": { "efo_id": "http://www.ebi.ac.uk/efo/EFO_0000313", "label": "carcinoma", "path": [ [ "EFO_0000616", "EFO_0000311", "EFO_0000313" ] ], "therapeutic_area": { "codes": [ "EFO_0000616" ], "labels": [ "neoplasm" ] } }, "id": "EFO_0000313" } } ], "facets": {}, "from": 0, "size": 3, "therapeutic_areas": [], "took": 22, "total": 763

Get all the available evidence for a target-disease association

For a given target-disease pair the object returned by the public/association/filter endpoint is a summary of all the evidence available.

The public/evidence/filter endpoint will instead serve every single piece of information for a target-disease relationship in the form of an evidence objects.

Evidence objects are an enriched version of our input data described in the Open Targets JSON schema.

Continuing with the NOD2 and IBD example, we can construct a query to retrieve basic information about the evidence available. By default the API returns the strongest 10 pieces of evidence, but that can be changed with the size parameter.

http https://api.opentargets.io/v3/platform/public/evidence/filter target==ENSG00000167207 disease==EFO_0003767 datastructure==simple{ "data": [ { "disease.efo_info.label": "ulcerative colitis", "disease.id": "EFO_0000729", "id": "442adb2d1b13c4f2ccf7389988aefde5", "scores.association_score": "1.0", "sourceID": "uniprot_literature", "target.gene_info.symbol": "NOD2", "target.id": "ENSG00000167207", "type": "genetic_association" }, { "disease.efo_info.label": "Crohn's disease", "disease.id": "EFO_0000384", "id": "b854a299c31156c3b14baf1b6638041c", "scores.association_score": "1.0", "sourceID": "uniprot_literature", "target.gene_info.symbol": "NOD2", "target.id": "ENSG00000167207", "type": "genetic_association" }, { "disease.efo_info.label": "Autosomal recessive early-onset inflammatory bowel disease", "disease.id": "Orphanet_238569", "id": "14be78930ae2c3275aaf6d2ebaf94b9d", "scores.association_score": "0.4972", "sourceID": "phenodigm", "target.gene_info.symbol": "NOD2", "target.id": "ENSG00000167207", "type": "animal_model" }, { "disease.efo_info.label": "Autosomal recessive early-onset inflammatory bowel disease", "disease.id": "Orphanet_238569", "id": "cb218a9ee90fc091c88662039c2591a9", "scores.association_score": "0.43", "sourceID": "phenodigm", "target.gene_info.symbol": "NOD2", "target.id": "ENSG00000167207", "type": "animal_model" }, { "disease.efo_info.label": "Autosomal recessive early-onset inflammatory bowel disease", "disease.id": "Orphanet_238569", "id": "f73ba0702491e426e76c4c5cb130c5ea", "scores.association_score": "0.4203", "sourceID": "phenodigm", "target.gene_info.symbol": "NOD2", "target.id": "ENSG00000167207", "type": "animal_model" }, { "disease.efo_info.label": "Crohn's disease", "disease.id": "EFO_0000384", "id": "cb48f4e17df89aaf89a3cf8e025d981c", "scores.association_score": "0.414", "sourceID": "europepmc", "target.gene_info.symbol": "NOD2", "target.id": "ENSG00000167207", "type": "literature" }, { "disease.efo_info.label": "inflammatory bowel disease", "disease.id": "EFO_0003767", "id": "d82be30eda25577a247c5eabc5e00807", "scores.association_score": "0.376", "sourceID": "europepmc", "target.gene_info.symbol": "NOD2", "target.id": "ENSG00000167207", "type": "literature" }, { "disease.efo_info.label": "Crohn's disease", "disease.id": "EFO_0000384", "id": "95fbdb0bd5e06d0883ae387b3b663c3d", "scores.association_score": "0.35600000000000004", "sourceID": "europepmc", "target.gene_info.symbol": "NOD2", "target.id": "ENSG00000167207", "type": "literature" }, { "disease.efo_info.label": "Crohn's disease", "disease.id": "EFO_0000384", "id": "4392f242b6646459bf58505178739416", "scores.association_score": "0.34", "sourceID": "europepmc", "target.gene_info.symbol": "NOD2", "target.id": "ENSG00000167207", "type": "literature" }, { "disease.efo_info.label": "Crohn's disease", "disease.id": "EFO_0000384", "id": "6a41a2481cbb047ebf61bd2eaf3fde86", "scores.association_score": "0.314", "sourceID": "europepmc", "target.gene_info.symbol": "NOD2", "target.id": "ENSG00000167207", "type": "literature" } ], "facets": null, "from": 0, "size": 10, "therapeutic_areas": [], "took": 387, "total": 1047 }

This tutorial is just scratching the surface of what is possible to do with the Open Targets REST API. Very complex queries can be created to cover many usage scenarios. We will cover those in upcoming tutorials in this blog.

Importantly, we will never track the content of your API request but instead monitor the overall usage of the API.

Also, don’t forget to let us know what you think of the API after using it — we’d love to hear how we could make the API more useful in our next release.

Originally published at blog.opentargets.org on April 18, 2016.

--

--