Unleashing Keywords: Powering Search with Cross-Ontology Synonym Fusion

Pawan Verma
7 min readSep 18, 2023

--

Motivation

In the previous article, we demonstrated the integration of biomedical ontologies by leveraging known relationships between biological entities. We showed through examples how linking biological entities can enable users to write complex queries easily using a knowledge graph on Polly.

Another important aspect of enhancing search was to enable user input to be more generalizable i.e. a user need not provide the exact search keywords to find relevant documents. An exact search term will most certainly return documents that are relevant to the user intent but can probably miss other related documents containing keywords that are synonymous (alike in meaning) with the input keyword.

Figure 1: Query expansion enables relevant keywords to be expanded to their synonymous terms.

In this blog, we aim to address a very specific challenge faced with a single-ontology-based approach. There exist several biomedical ontologies developed independently that cater to a single domain. It is common across all domains that different ontologies have different modeling views of a given domain.

The efforts of the OBO Foundry include external references in their ontologies. There are two forms of these: direct cross-references to other ontologies, and logical definitions that correspond to composite references to two or more other ontologies. Both are manually curated, high-quality knowledge sources that can be reused.

Below are some of the challenges we aim to address when using a single-ontology-based approach and furthermore, devise a solution to bridge the gaps that exist within single-ontology synonyms.

Challenges with single-ontology synonyms

  1. Incompleteness: Single-ontology synonym systems often rely on a predefined vocabulary or ontology to map terms to synonyms. These ontologies may not encompass all possible terms and synonyms used in the field of biomedicine.
    Consider the term “myocardial infarction” (a heart attack). While it might be the preferred term in a biomedical ontology, healthcare professionals and researchers might also use synonyms like “coronary thrombosis” or “cardiac arrest,” which may not be covered by the ontology.
  2. User Intent: Different users or contexts may have varying preferences for terminology, and single-ontology synonym systems may not always capture the intended meaning accurately.
    In the biomedical field, a user searching for “CAD” could be referring to “coronary artery disease” or “computer-aided diagnosis.” Without considering user intent, a single-ontology system might not provide the desired results.
  3. Acronym and Abbreviation Variations: Biomedicine is rife with acronyms and abbreviations, and different sources or authors may use different variations for the same concept. A single-ontology system may not account for these variations.
    "HIV" can stand for "human immunodeficiency virus," but in some contexts, it might be referred to as "HIV-1" or "HIV-2," indicating different strains. A single-ontology system might not differentiate between these variations, potentially causing confusion or incorrect information retrieval.

To address these challenges in the biomedical domain, it’s important to consider the following strategies:

  1. Integration of Multiple Ontologies: Combining multiple biomedical ontologies can help mitigate the incompleteness issue. For example, using both SNOMED CT and MeSH (Medical Subject Headings) can provide a broader coverage of terms and synonyms.
  2. Contextual Disambiguation: Implementing algorithms that take into account user context and intent can enhance the accuracy of synonym matching. For instance, by analyzing the surrounding text, a system can determine if “CAD” in a particular context refers to coronary artery disease or computer-aided diagnosis.
  3. Synonym Expansion: Utilize algorithms that identify and expand acronyms and terms into their full forms and similar terms when appropriate. For instance, recognizing “HIV-1” as a synonym for “human immunodeficiency virus” can improve information retrieval.

In the case study below, we utilize the first approach to integrate multiple ontologies to help overcome the challenge of incompleteness.

Case Study: Cross-ontology fusion of synonyms using MeSH disease terms

Workflow

Figure 2: Synonym integration using BioPortal

We utilize BioPortal Web Ontology Service APIs [1] to expand the synonyms provided by the MeSH thesaurus. BioPortal is an integrated database of over 1000 biomedical ontologies with over 70M mappings across 14 M+ classes.

Using the search API endpoint provided by BioPortal, a POST request retrieves matches of the input MeSH term across all other ontologies as shown in Figure 2.

http://data.bioontology.org/search?q=Acute Myeloid Leukemia&include=synonym&require_exact_match=true

returns the response payload

"collection": [
{
"@id": "http://purl.bioontology.org/ontology/MEDDRA/10000886",
"@type": "http://www.w3.org/2002/07/owl#Class",
"links": {
...
"@context": {
...
}
},
"@context": {
"@vocab": "http://data.bioontology.org/metadata/"
}
},
{
"@id": "http://purl.bioontology.org/ontology/LNC/LA26787-4",
"@type": "http://www.w3.org/2002/07/owl#Class",
"links": {
...
"@context": {
...
}
},
"@context": {
"@vocab": "http://data.bioontology.org/metadata/"
}
},
{
"synonym": [
"Acute myeloid leukemia (morphologic abnormality)",
"Acute non-lymphocytic leukemia",
"Acute myeloid leukaemia",
"Acute myelocytic leukemia",
"Acute myelocytic leukaemia",
"Acute myelogenous leukemia",
"Acute non-lymphocytic leukaemia",
"Acute granulocytic leukaemia",
"Acute granulocytic leukemia",
"Acute myelogenous leukaemia"
],
"@id": "http://purl.bioontology.org/ontology/SNOMEDCT/1162928000",
"@type": "http://www.w3.org/2002/07/owl#Class",
"links": {
...
"@context": {
...
}
},
"@context": {
"@vocab": "http://data.bioontology.org/metadata/",
"synonym": "http://data.bioontology.org/metadata/skossynonym"
}
},
{
"@id": "http://purl.bioontology.org/ontology/OMIM/MTHU007776",
"@type": "http://www.w3.org/2002/07/owl#Class",
"links": {
...
"@context": {
...
}
},
"@context": {
"@vocab": "http://data.bioontology.org/metadata/"
}
},
{
"synonym": [
"acute myelogenous leukemia",
"ANLL",
"leukemia, acute nonlymphocytic",
"acute myeloblastic leukemia",
"AML - Acute Myeloid Leukemia",
"leukemia, acute myeloid",
"AML",
"acute myelogenous leukemias"
],
"@id": "http://purl.bioontology.org/ontology/PDQ/CDR0000043424",
"@type": "http://www.w3.org/2002/07/owl#Class",
"links": {
...
"@context": {
...
}
},
"@context": {
"@vocab": "http://data.bioontology.org/metadata/",
"synonym": "http://data.bioontology.org/metadata/skossynonym"
}
},
{
"synonym": [
"Acute myelocytic leukemia",
"acute myeloblastic leukemia",
"Acute Granulocytic Leukemia",
"acute myeloid leukemia",
"Acute myelogenous leukemia",
"AML - Acute Myeloid Leukemia",
"Acute Nonlymphocytic Leukemia",
"Hematopoeitic - Acute Myleogenous Leukemia (AML)",
"acute myelogenous leukemia",
"Acute myeloblastic leukemia",
"ANLL",
"acute nonlymphocytic leukemia",
"Acute Myelogenous Leukemia",
"Acute granulocytic leukemia",
"Acute Myeloid Leukemia (AML)",
"AML",
"Acute myeloid leukemia (AML)",
"Acute Myelocytic Leukemia",
"Acute Myelogenous Leukemias",
"Acute Myeloblastic Leukemia",
"Acute Myeloid Leukemia"
],
"@id": "http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C3171",
"@type": "http://www.w3.org/2002/07/owl#Class",
"links": {
...
"@context": {
...
}
},
"@context": {
"@vocab": "http://data.bioontology.org/metadata/",
"synonym": "http://data.bioontology.org/metadata/skossynonym"
}
},
{
"synonym": [
"leukemia, acute myeloid, susceptibility to, autosomal dominant, somatic mutation",
"leukemia, acute myeloid, somatic",
"acute myeloblastic leukemia",
"acute myeloid leukemia",
"acute myelogenous leukemia",
"acute myelocytic leukemia",
"ANLL",
"acute granulocytic leukemia",
"leukemia, acute myeloid, reduced survival in, somatic",
"acute nonlymphocytic leukemia",
"AML - acute myeloid leukemia",
"leukemia, acute myeloid, autosomal dominant, somatic mutation",
"acute Nonlymphocytic leukemia",
"AML",
"hematopoeitic - acute Myleogenous leukemia (AML)",
"acute myeloid leukemia (AML)",
"acute myelogenous leukemias",
"myeloid leukemia, acute",
"leukemia, myelocytic, acute",
"acute myeloid leukemia, somatic",
"myeloid leukemia, acute, M4/M4Eo subtype, somatic",
"leukemia, acute myeloid, susceptibility to",
"leukemia, acute myelogenous",
"acute non lymphoblastic leukemia",
"leukemia, acute myeloid"
],
"@id": "http://www.ebi.ac.uk/efo/EFO_0000222",
"@type": "http://www.w3.org/2002/07/owl#Class",
"links": {
...
"@context": {
...
}
},
"@context": {
"@vocab": "http://data.bioontology.org/metadata/",
"synonym": "http://data.bioontology.org/metadata/skossynonym"
}
},
{
"synonym": [
"急性骨髄球性白血病",
"Leukemia, Myeloid, Acute",
"Leukemia, Myelocytic, Acute",
"Acute myeloid leukaemia",
"Acute myelocytic leukemia",
"Acute myelocytic leukaemia",
"acute myeloid leukemia",
"Acute myelogenous leukemia",
"急性骨髄白血病",
"Acute granulocytic leukaemia",
"Acute leukemic myelosis",
"Acute granulocytic leukemia",
"急性骨髄性白血病",
"AML"
],
"@id": "http://purl.jp/bio/4/id/200906073896258749",
"@type": "http://www.w3.org/2002/07/owl#Class",
"links": {
...
"@context": {
...
}
},
"@context": {
"@vocab": "http://data.bioontology.org/metadata/",
"synonym": "http://data.bioontology.org/metadata/skossynonym"
}
},
{
"@id": "http://purl.org/obo/owl/HP#HP_0004808",
"@type": "http://www.w3.org/2002/07/owl#Class",
"links": {
...
"@context": {
...
}
},
"@context": {
"@vocab": "http://data.bioontology.org/metadata/"
}
},
{
"synonym": [
"acute myelogenous leukemia",
"acute myelogenous leukaemia",
"acute myeloblastic leukemia",
"Leukemia, Myelocytic, acute",
"acute myeloid leukaemia",
"AML - acute Myeloid Leukemia",
"acute myeloblastic leukaemia"
],
"@id": "http://purl.obolibrary.org/obo/DOID_9119",
"@type": "http://www.w3.org/2002/07/owl#Class",
"links": {
...
"@context": {
...
}
},
"@context": {
"@vocab": "http://data.bioontology.org/metadata/",
"synonym": "http://data.bioontology.org/metadata/skossynonym"
}
},
{
"@id": "http://purl.obolibrary.org/obo/NCIT_C3171",
"@type": "http://www.w3.org/2002/07/owl#Class",
"links": {
...
"@context": {
...
}
},
"@context": {
"@vocab": "http://data.bioontology.org/metadata/"
}
},
{
"@id": "http://purl.obolibrary.org/obo/DOID_9119",
"@type": "http://www.w3.org/2002/07/owl#Class",
"links": {
...
"@context": {
...
}
},
"@context": {
"@vocab": "http://data.bioontology.org/metadata/"
}
},
{
"@id": "http://purl.obolibrary.org/obo/DOID_9119",
"@type": "http://www.w3.org/2002/07/owl#Class",
"links": {
...
"@context": {
...
}
},
"@context": {
"@vocab": "http://data.bioontology.org/metadata/"
}
},
{
"synonym": [
"leukemia, acute myeloid, susceptibility to, autosomal dominant, somatic mutation",
"leukemia, acute myeloid, somatic",
"acute myeloblastic leukemia",
"acute myeloid leukemia",
"acute myelogenous leukemia",
"acute myelocytic leukemia",
"ANLL",
"acute granulocytic leukemia",
"leukemia, acute myeloid, reduced survival in, somatic",
"acute nonlymphocytic leukemia",
"AML - acute myeloid leukemia",
"leukemia, acute myeloid, autosomal dominant, somatic mutation",
"acute Nonlymphocytic leukemia",
"AML",
"hematopoeitic - acute Myleogenous leukemia (AML)",
"acute myeloid leukemia (AML)",
"acute myelogenous leukemias",
"myeloid leukemia, acute",
"leukemia, myelocytic, acute",
"acute myeloid leukemia, somatic",
"myeloid leukemia, acute, M4/M4Eo subtype, somatic",
"leukemia, acute myeloid, susceptibility to",
"leukemia, acute myelogenous",
"acute non lymphoblastic leukemia",
"leukemia, acute myeloid"
],
...

From the above response, we can clearly see ontologies like MEDDRA, and EFO were unable to provide any synonyms for Acute Myeloid Leukemia. However, as a result of an integrated approach, we notice an increase in coverage and therefore better keyword expansion.

Additionally, we performed a series of string-cleaning operations to remove non-English terms and terms containing special characters.

Concluding Remarks

Addressing these challenges is essential for ensuring the accuracy and effectiveness of information retrieval systems in the biomedical domain. By embracing a combination of these strategies, we can better bridge the gap between the diverse terminologies used in biomedicine and provide more comprehensive and contextually relevant information to users.

Thanks to: Nayanika Kalita

References

  1. Whetzel PL, Noy NF, Shah NH, Alexander PR, Nyulas C, Tudorache T, Musen MA. BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications. Nucleic Acids Res. 2011 Jul;39(Web Server issue):W541–5. Epub 2011 Jun 14.

--

--

Pawan Verma

Bioinformatics Engineer, Post Grad in Bioinformatics and Computational Biology