Bioinformatics, Ontologies, Linked Data, and Data De-Silo-fication

Kingsley Uyi Idehen
OpenLink Virtuoso Weblog
3 min readAug 7, 2018

Ontologies provide powerful “context lenses” through which to understand data, even more so when the data in question originated from disparate sources.

Developing an ontology takes a lot of time, effort, and niche knowledge, especially when starting from scratch. Any time a previously developed ontology can be reused, in part or in total, on its own or with embellishment, the cost of the new project will be lowered substantially.

Such Ontology reuse requires some level of conceptual relevance and overlap across what exists and what’s being derived. An ontology developed for geological purposes is unlikely to be useful when working with genetics, because there’s little crossover between these fields. However, a genetics ontology developed by an spider geneticist is likely to be very useful to a primate geneticist, because genetic research is very similar regardless of the species involved.

As suggested above, the realm of Bioinformatics has quite advanced and demonstrable ontology reuse between the Swiss Bioinformatics Institute and the European Institute of Bioinformatics & Bioinformatics Institute (EBML-EBI), including domain specific knowledge bases (KBs) such as Uniprot, Rhea, and Identifiers.org, to name a few. All of these contribute 5-Star Linked Data to a powerful Semantic Web that is available for fine-grained query access using the SPARQL Query Language via their respective SPARQL Query Web Service Endpoints.

SPARQL Query Web Service Endpoints

  1. EMBL-EBI
  2. Uniprot
  3. Rhea
  4. Identifiers.org

Visible Effects of Ontologies, Knowledge Bases, and Linked Data — a simple walkthrough exercise

Linked Data Discovery and Exploration Demo

This exercise demonstrates how advanced use of Ontologies combined with Semantic Webs of Linked Data in the realm of Biology have created powerful integration across disparate data sources that would otherwise exist as conventional data silos.

(0) Before you begin, install OSDS, the OpenLink Structured Data Sniffer. This browser extension improves understanding of both the power of Linked Data and the KB-building prowess of well-defined ontologies.

(1) Start at the Lectin3D V2 curated database.

(2) Scroll down to 1B09 homo sapiens Homo sapiens (about the 7th block), and click View the 3D structure.”

(3) Look for the big green buttons (currently found just above the "SWISS MODEL PLIP INTERACTIONS VIEWER" section).

(4) Click on your choice of —

Conclusion

Applying Linked Data principles to data in any domain of interest produces a Semantic Web of Linked Data, the underlying quality of which ultimately reflects the domain expertise of its contributors. As this post demonstrates, the Bioinformatics industry is making advanced strides in this area that puts Computational Biology on the forefront of Artificial Intelligence, Machine Learning, and anything else that ultimately seeks to exploit increased access to otherwise-siloed data within and across domains of specialization.

None of this is marginally achievable with a conventional single-model RDBMS that only supports SQL. Likewise, none of this is marginally achievable via the new genre of single-model DBMS projects associated with the “Graph Databases” banner. What you need, in all cases, is a multi-model RDBMS that understands and takes full advantage of Linked Data principles, i.e., the fundamental, technological essence that makes both the World Wide Web, and its Semantic Web enhancement, tick!

Related

--

--

Kingsley Uyi Idehen
OpenLink Virtuoso Weblog

CEO, OpenLink Software —High-Performance Data Centric Technology Providers.