Building an OWL ontology for Genesis

Rushikesh Halle
Thoughtworks: e4r™ Tech Blogs
7 min readJan 24, 2024
Photo by Joshua Sortino on Unsplash

Introduction

The most difficult part of working with RDF databases is building ontologies. The ontology provides a schema as well as inference rules for RDF databases. One of the primary uses of RDF databases are often found in the biomedical field because of the highly interconnected nature of the data. With automated laboratories being the trend of the decade, the need for RDF databases has increased even more as they demand data in structured format.

Adapting the RDF databases comes with its own challenges, out of those, development of a rich and reusable ontology often comes at the top(Ref. Ontology Engineering). While working with one such RDF database named Genesis-DB, we developed a new ontology incorporating some relevant aspects of the related domain. While doing so, we came across multiple challenges. We will discuss a few of them in this article, along with the methods and tools that we adapted to create the ontology.

Ontology development

Following diagram gives overview of steps taken for the development of the Genesis ontology:

Fig. 1. Ontology Development Process

Step 1: Listing the competency questions

One of the primary steps towards building an ontology involves listing competency questions. Such questions should be answered by querying the RDF database. This gives a good idea of what kinds of entities need to be represented in the database and the relationships between them. Following are some examples of questions that were supposed to be answered by the Genesis-DB:

  • Give all unique combinations of experimental variables (temperature, pH, flow-rate, growth media, stir-rate, rpm, inlet-ratio) for which gene-ids [g1, g2, g3] had a count > 5,000.
  • Give the set of strain-ids involved in experiments in which metabolite-ids [m1, m2] did not appear identified in any mass spectrometry data

Step 2: Creating RDFS Schema

Once the questions are agreed upon, an exhaustive list of terms for entities and relationships can be created. In other words, the RDFS schema for the data.

This involves creating unique IRIs for all the terms, labeling them with appropriate names, adding definitions for each of them, and finally creating hierarchy within them. Relationships need to have a domain and range defined.

Note: For assisting in this step, an entity relation diagram can be drawn, which comes in very handy for finding missing relationships , entities and introducing possible optimizations such as shortcut paths for querying. The OBI ontology site has some good examples of diagrams for ontologies based on BFO classification. The following diagram contains entity relation diagrams used for creating the first version of the Genesis ontology: draw io.

Listing of terms can be done in one of the following ways:

  1. Use terms from existing ontologies:

There is a high possibility that the term you are looking for already exists in previously built ontologies in the corresponding domain. You can search for the ontologies on the listing website for the related domain, for example, the Ontobee site contains a listing for most of the ontologies found in the biomedical domain. It also has a search bar for keywords lookups in the ontologies. If the lookup is successful, the term definitions can be downloaded using Ontofox tool.

2. Generate your own:

For terms that are not found from the above step, you can generate your own.

The term IRIs can be of the form:

http:://[domain]#[Entity_ID]

Usually, the Entity_IDs are not human-readable. They are of fixed length and have incremental integers as a suffix. This comes with the benefit of having simple, short, and uniform IRIs, which are unique and can be automatically generated using a simple IRI generator.

Manually doing all of this is challenging. The existing tools, such as protege, have partial support for the above steps but lack a few useful and required features, such as collaborative development.

For assisting this step, we have created a Google sheets plugin and a Github Pipeline.

The plugin has options for generating new IRIs for terms, along with the option to commit the final sheet of terms to a github repository through the github pipeline. The github pipeline automatically converts the sheet of terms to owl ontology in turtle format. For a successful conversion, the user simply needs to configure mapping.ttl which maps each column to its corresponding type and creates cross-connections within columns. Other configurations, such as ontology version and IRI prefixes, can be set using the configuration sheet.

Demo video of plugin: insert_iri.mp4

Link to the pipeline code and the example sheet.

Note: The above example sheet contains readme with addtional information for configuring and running the ontology plugin

Fig. 2. Ontology plugin for Google Sheets

Step 3: Adding Inference Rules

Once the RDFS ontology is generated, it can be constrained with additional owl rules using GUI tools such as protege. More rules, more inferences and hence richer data. Inference rules help in knowledge discovery, and they also help in simplifying SPARQL queries. The following example shows a simple case of knowledge discovery:

Data without inference along with query and query results:

Fig. 3. RDF data, query and the corresponding results without inference

Adding inference rules for automatic knowledge discovery:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

BASE <http://example.org/ontology#>

:derived_from a owl:ObjectProperty ;
rdf:type owl:TransitiveProperty ;
rdfs:label "derived_from" .

Data with inference along with query and query results:

Fig. 4. RDF data, query and the corresponding results with inference

Step 4: Verifying the Ontology

After adding the rules, the ontology can be checked for consistency using appropriate reasoner, for example Pellet.

Example of inconsistent ontology:

@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@base <http://example.org/ontology#> .
@prefix : <http://example.org/ontology#> .

# declare ontology URI
<http://example.org/ontology> rdf:type owl:Ontology .

# declare a new relationship named :is_about with its domain as :information_content_entity
:is_about rdf:type owl:ObjectProperty ;
rdfs:domain :information_content_entity .

# declare a new class named material_entity
:material_entity rdf:type owl:Class .

# declare a new class named :quality which is a class of entities which are related to at least a single :material_entity with "is_about" relationship
:quality rdf:type owl:Class ;
rdfs:subClassOf [ rdf:type owl:Restriction ;
owl:onProperty :is_about ;
owl:someValuesFrom :material_entity
] .

# declare a new class named :information_content_entity which is disjoint with :quality i.e. no entity can an :information_content_entity as well as a :quality
:information_content_entity rdf:type owl:Class ;
owl:disjointWith :quality .

Explanation:

Rule1:

The “is_about” property has the domain as “information_content_entity”.

Rule2:

All entities which are related to a material entity using “is_about” relation belong to the “quality” class.

Rule3:

“information_content_entity” and “quality” are disjoint classes.

The above rules together are contradictory.

Step 5: Publishing Ontology

The ontology is now ready to be enriched by adding additional synonyms, examples, and comments for each term. Finally, the ontology can be published with a proper version number on the listing site for the corresponding domain, such as Ontobee for the biomedical domain. The Google Sheets plugin can configured for committing ontology with correct version:

Fig. 5. ontology version configuration

This process might involve following additional steps of verification that need to be followed within the corresponding community before publishing it on the listing site. We are in the process of adding the genesis ontology to the ontobee listing.

Ontology versioning for stability:

While working with autonomous systems the database schema needs to be stable for assuring correctness of the query results. Even slight changes in the underlying data structure unapparent to human users may break the bond between database and upstream software agents.

One way to overcome stability issues is by following right ontology development practices, such as ontology versioning. Ontology can be linked to a specific version number and RDF data can be connected to a specific version of ontology. This way changes in the data structure will get easily detected and the data with correct version can be accessed.

As an example, deprecated classes are defined as obsolete rather than deleted in version updates.

There are multiple ways of tracking the versioning information. Out of those, one of the best ways is to store the versioning information in the ontology itself, i.e. using the preexisting OWL attribute owl:versionInfo.

Semantic versioning (a software development standard ) can be used to apply version numbers. The following figure demonstrates semantic versioning for ontology.

Fig. 6. Semantic Versioning for Ontology

Conclusion:

Ontology development is the most critical step in working with RDF databases, because the richness and reusability of the RDF database depend on how rich the ontology is. Although it comes with its own challenges, it can be simplified by following the right practices and using the right tools.

In this article we demonstrated,

  • Use of Google sheets along with a custom plugin that simplifies RDFS schema development with added benefits such as collaborative development.
  • The custom plugin also provides feature to commit the ontology to a github pipeline which pulls the related term annotations from ontology listing server endpoints, converts the Google sheets file to Turtle RDF format using a configurable map and finally commits the ontology with a configurable version.
  • The committed ontology can be further enriched using inference rules, additional term synonyms etc using tools such as protege.
  • Finally, the ontology can be published to the ontology listing site with appropriate version number.

References:

  1. Ontology Development 101: A Guide to Creating Your First Ontology
  2. Semantic web ontologies (YouTube video)
  3. OBI ontology
  4. Ontobee ontology listing site

Disclaimer: The statements and opinions expressed in this blog are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.

--

--