Building Open source Terminology Systems for Homecare Informatics

A general-purpose Python/Django framework for SNOMED CT

Chimezie Ogbuji
7 min readFeb 5, 2023

In a previous professional life, I was Medical Informatician, and my area of specialty was Clinical Research Informatics and wrote quite a bit on the topic. This is a mouthful that nevertheless describes what I did.

Photo by rosefirerising on Flickr

My preferred definition and explanation of these terms are from Hersh¹, who describes informatics as a "discipline focused on the acquisition, storage, and use of information in a specific setting or domain." This definition is distinct from the more commonly used term information technology, which refers to the operation of computers and their related technologies. Informatics is usually rooted in a domain; for me, the domain was the clinical research process.

A prominent part of this was the use of medical terminology systems to address the well-documented issue of a lack of interoperability between medical information systems, not just for research but in general. The consensus, then and now, is that using ontologies is a critical step in their evolution and how these systems should be built.

An ontology captures² a conceptualization of a domain. It is often comprised of definitions of an organized hierarchy of concepts in the domain and relationships between them that are curated by domain experts and capture the nomenclature or terminology of the domain in a way that computer software can process and understand. Medical ontologies are probably one of the most underappreciated examples of how Artificial Intelligence technologies are used in medical information systems.

Another point of convergence in the vast body of literature about this topic was on SNOMED-CT, the international medical terminology system and standard. The many advantages (and disadvantages) of using SNOMED-CT are beyond the scope of what I want to discuss and demonstrate here and perhaps a subject of another article. However, its main advantage is that it is an international, standardized, and formal ontology of the medical domain that captures the meaning of much of medical terminology in a language-neutral manner.

When I left the world of clinical research informatics, specifically in the inpatient setting (i.e., primarily dealing with patients staying in a hospital while under treatment), and later started working in the outpatient setting (i.e., related to care received by patients outside the hospital, such as in Homecare, the domain where I currently work), I found myself again gravitating to SNOMED-CT. Hospital systems have long since evolved towards using payment systems based on diagnosis-related groups, or DRGs³. These systems organize patients

with similar clinical and treatment characteristics into groups, where patients in the same group are expected to use similar amounts of resources, thus incentivizing providers to enable effective cost management [..] calculating DRGs is a time-consuming process requiring expert efforts to manually identify information from patient records, standardize it to ICD (International Classification of Diseases) [..]

On January 1, 2020, just before the beginning of the COVID-19 pandemic in the United States, Medicare adopted a new and disruptive DRG-based payment system for home care providers called Patient-Driven Groupings Model (PDGM)⁴. The combination of this with the fact that "40 percent of home health care providers are expected to be in debt "⁵ as of the Institute of Medicine’s assessment of the industry way back in 2015, technologies that facilitate the use of ICD have become extremely important in the field of homecare informatics and the very disruptive period its industry is going through. I believe that technologies and systems that use SNOMED-CT and other similar medical terminology systems with ICD will inevitably become important in time and as the dust settles through this disruption to ease the time-consuming nature of identifying and calculating DRGs and increase revenue from insurance reimbursement.

For much of my career, I developed software systems in Python, and the most widely-used and robust framework for building websites and web-based applications in that language is Django. Django has a very impressive and feature-rich Object Relational Mapper (ORM). An ORM provides an application programming interface (API) to a relational database in a programming language. Creators of ORMs often design them to simplify the daunting process of creating and dispatching queries to a database and collecting the results of those queries. So, when I began analyzing PDGM with the software I developed for Amara Home Care, I inevitably began looking for quality open-source libraries for working with Django, SNOMED-CT, and ICD.

I found Arkadiusz Szydełko’s django-snomed-ct Github project. However, as there hadn’t been any active development over the last seven years, I created a fork. It is available as one of my GitHub repositories. The main goal was to have it work with the latest versions of SNOMED-CT and to take full advantage of Python and Django to provide a greatly simplified API for use in the analysis of medical terminology.

A secondary goal was to create a framework for taking advantage of mappings between SNOMED-CT and ICD-10 to facilitate the automated analysis of ICD-10 terminology in general and its use with homecare informatics specifically, as the field and its industry shift to become more DRG-based. Mappings between medical terminology systems can provide⁵ additional, biologically-meaningful paths between their terms, and this can especially be the case for terminology systems expressed as ontologies.

I run the software on Ubuntu 22.04.01, using Python 3.10.6, Django 3.2.13, and (as of the time of this writing) the September 2022 release of SNOMED-CT US Edition(R). This open-source project can be checked out to your local machine via Git with the following command:

$ git clone https://github.com/chimezie/django-snomed-ct.git

This project provides a rich, dynamic database-access API to the SNOMED-CT medical terminology system once you have loaded it into the Django database. The project's main Wiki has a section on how to use the load_snomed_ct_data custom django-admin command to load a SNOMED-CT release into a Django database (MySQL, Postgres, etc.). It also includes an option for loading the ICD10 SNOMED-CT mappings produced and maintained by SNOMED International.

SNOMED-CT contains concepts, each representing a universal medical concept with a "formal logic-based [definition], organized into hierarchies "⁷. These concepts each have descriptions that provide human-readable explanations and labels for them. SNOMED-CT also contains relationships that provide links between concepts that help fully or partially define each concept's formal meaning (or semantics).

So, for example, you can use the interface to fetch instances of the snomed_ct.models.Concept (Django model and Python Class) whose name matches the regular expression ".+aort.+stenosis"

concepts = Concept.by_fully_specified_name(term__iregex='.+aort.+stenosis')

This will include the 'Aortic valve stenosis' concept, for example, as well as the 'Aortic stenosis, non-rheumatic' and 'Congenital subaortic stenosis' concepts. Searching by regular expressions is a powerful way to find terminology via string matching against parts or patterns in their names, allowing for variations common in the scientific naming of medical concepts. In this case, the by_fully_specified_name method will enable you to match concepts by their fully specified SNOMED-CT name.

Once you have a concept, the interface provides access to its fully specified name — without the category prefix that SNOMED-CT includes in the fully specified name — and the entire fully specified name. You can also fetch a single concept by its SNOMED-CT identifier:

concept = Concept.by_id(194733006)
concept.fully_specified_name_no_type
concept.fully_specified_name

You can programmatically navigate the relationships between concepts using the interface. Of particular importance is the ISA SNOMED-CT relationship, which captures the semantics of a subsumption (or generalization) relationship between concepts in formal mathematical logic. A hierarchy or taxonomy of medical concepts can be represented and navigated using this relationship. The software provides an isa attribute on Concept instances that you can use to iterate over or walk through medical concepts that are more general than the one you start with:

for general_concept in c.isa:
.. print("\t", general_concept)

Similarly, and in reverse, you can use the specializations attribute to iterate over concepts that are more specific than a given one. You can navigate all the relationships that a given concept is involved in using the outbound_relationships() method. This method returns all relationships that begin with (or where the subject of the relationship is) the given concept. In particular, this method returns snomed_ct.models.Relationship instances. Each of these has a type and destination attribute, which provides the kind of relationship it is and the object of the relationship, respectively. Both of these are Concepts instances themselves.

for rel in snomed_concept.outbound_relationships():
.. print("\t- {} -> {}".format(rel.type, rel.destination))

The software project also provides and documents how to use its interfaces for navigating the SNOMED-CT to ICD-10 mappings once loaded, which is the primary motivation for this project. It also documents how transitive closures over the ISA relationship can be loaded and used to take advantage of the formal, mathematical, and machine-readable manner in which SNOMED-CT specifies the medical terminology to leverage logical entailment, reasoning and query-answering capabilities. However, these will be the subject of later articles on this project.

[1] Hersh, William. "A stimulus to define informatics and health information technology." BMC Medical Informatics and Decision Making 9 (2009): 1–6.

[2] Ogbuji, Chimezie. "A framework ontology for computer-based patient record systems." Proceedings of the 2nd International Conference on Biomedical Ontology (ICBO' 11), pp. 217–223, Buffalo, NY, USA, 2011.

[3] Liu, J., et al. "Early prediction of diagnostic-related groups and estimation of hospital cost by processing clinical notes. NPJ Digit Med. 2021; 4: 103.”

[4] Plummer, Elizabeth, and William F. Wempe. "Home health agencies: empirical evidence on the patient-driven groupings model's expected effects on agency reimbursements." Home Health Care Management & Practice 33.3 (2021): 183–192.

[5] Forum on Aging, Disability, and Independence; Board on Health Sciences Policy; Division on Behavioral and Social Sciences and Education; Institute of Medicine; National Research Council. "The Future of Home Health Care: Workshop Summary". Washington (DC): National Academies Press (US); 2015 August 4. PMID: 26355186.

[6] Ogbuji, Chimezie, and Rong Xu. "Integrating large, disparate biomedical ontologies to boost organ development network connectivity." Data Integration in the Life Sciences: 8th International Conference, DILS 2012, College Park, MD, USA, June 28–29, 2012. Proceedings 8. Springer Berlin Heidelberg, 2012. https://link.springer.com/chapter/10.1007/978-3-642-31040-9_7

[7] Data Analytics with SNOMED CT. http://snomed.org/analytics

--

--

Chimezie Ogbuji

An informatics engineer, data scientist, inventor, and business owner