Data Scientist in a War — Identifying Human Bodies Through Dental X-Ray Images
Since the darkest day in Israel’s history, my entire country and I have been profoundly changed (this year, Saturday, October 7th marked the end of the “long October holidays,” which encompassed the “Simhat Torah holiday,” symbolizing the joy of Torah, even though there might have been little joy and Torah during that time). There’s a noticeable decrease in happiness, trust, security, and all the positive aspects of life. The sirens deafen my ears, leaving me with remarkably negative emotions. The term “war” has flooded the news, and I carefully follow the guidance regarding the situation. Deciding whether to stay at home, find safety in secure spaces, or even send my daughter to school is no longer clear — our lives are enveloped in uncertainty from now on. I began reaching out to my family, friends, and colleagues to inquire about their health and well-being. Even the most cheerful individuals are no longer wearing their usual smiles. Fear and sadness have become part of our daily lives, and the world we once knew has transformed. Now, we are all striving to navigate this new reality.
Before the crisis, I was actively involved in about 3 to 5 professional social groups related to Artificial Intelligence (AI). However, during the ongoing crisis, dozons of new groups have raised, and I’ve now tripled the number of groups I participate in. Nowadays in Israel, there is a lot of enthusiasm for volunteering, with “cool” projects spanning the entire machine learning spectrum. These projects are especially focused on tackling issues like identifying fake news and providing a more accurate representation of our reality to the world.
About a week following that Saturday, I came across an article in the news discussing the necessity of identifying dead individuals through their dental records. It occurred to me that I could contribute by leveraging my experience in medical image processing. Despite the emotional weight of this task, I found the technical aspect to be intriguing. It represents a classic example of an AI task that is both important and capable of significantly reducing the workload for dentists.
Without much hesitation, my initial step was to search for research papers and prior work, a habit I’ve always followed. I was astonished by the abundance of existing research in this area. It seems that this challenge is quite common, and dental images are chosen because teeth are one of the most enduring parts of the human body. This is why dental images are considered a superior method for identification, Even more resilient than DNA, tattoos, fingerprints, and similar identifiers.
Once I comprehended the scope of the problem, and after delving into a book about the impact of social networks on project development, I realized that my next step was to establish a group dedicated to addressing this challenge. This group would consist of professionals in medical imaging, dentists, product managers, programmers, and more. Additionally, it would include individuals with valuable connections, such as police and IDF officers. I composed a message outlining the concept and my initial project work and sent it to the MeDS (Medical Data Science) group.
Remarkably, in less than an hour, I managed to assemble a group of over 60 members. Each member was carefully selected after I reviewed their professional descriptions and LinkedIn profiles to ensure we had a community of genuine experts, excluding any potential impostors. This group exceeded my initial expectations, and the connections we established were limitless. Everyone was willing to contribute and assist us in our growth and development. Our status now is collaborating intensively to build and train an identification model.
About the Israeli Forensic Groups
Before delving into technical details, I would like to express my deep appreciation for the remarkable spirit of volunteerism that characterizes the Israeli people. It’s important to grasp the difficult circumstances Israelis are currently facing:
- They have had to put aside their profound concerns and fears arising from the war.
- As a result of these horible events, parents are unable to send their children to kindergarten or school. They find themselves on high alert 24/7, striving to maintain a sence of normalcy. Deep down, they are ordinary human beings who experience their own fears and anxieties. Living under these conditions is an immense challenge.
- With their children at home, they are unable to work, and even if they attempt to, they find it exceedingly difficult to really focus on their tasks. This situation has led to concerns about their salaries, and they worry whether they will be able to provide for their families.
- They remain constantly connected to the news, seeking updates on the safety and well-being of their families and friends.
Despite these daunting circumstances, they somehow manage to find the time and the strength to offer their expertise and services for free to their fellow citizens. They are motivated by a deep commitment to their community and a sense of duty, without any financial gain. This selflessness holds significant meaning and deserves recognition.
Understanding the Identification Process
Given that I am not a dentist, I would like to provide an overview of three primary types of dental medical imaging:
1. Panoramic X-ray: This imaging technique captures a wide, detailed view of the entire oral cavity, including both upper and lower jaws.
2. Projective images, such as bitewing and status periapical X-rays, provide detailed views of specific areas within the mouth. Bitewing X-rays are often used to examine the crowns of upper and lower teeth and detect cavities, while periapical X-rays focus on a single tooth, showcasing the tooth from crown to root.
3. CT Scan: offer 3D views of the oral and maxillofacial regions. They are particularly useful for complex dental procedures, implant planning, and assessing conditions such as impacted teeth, fractures, and tumors. CT imaging allows for detailed examination of dental structures in a non-seated position, making it a valuable tool for forensic dental identification, especially when handling deceased individuals.
The primary objective in forensic dental identification is to compare CT and X-ray images. The transformation from CT to X-ray, involving a reduction in dimensions, can be achieved by determining the appropriate projection used in the CT scan. This alignment ensures that the two sets of images are accurately comparable for forensic analysis. This serves as the initial step. Subsequent to this, the next two tasks, namely identification and segmentation, will be discussed more thoroughly.
Methodologies
From this point on, I will dive into the technical aspects, specifically delving into the architecture of the paper. I believe that this is the most crucial part of the work. Here, I will provide detailed explanations for each element of the architectural building block, as depicted in Figure 2.
Input Data
First, it’s important to acknowledge the significance of this work in not requiring paired data for comparison (no labeling, which is advantageous). Senior professionals would recognize at this point that each instance’s output is represented in an embedded space, allowing their comparisons. However, an additional image is introduced (uniformly for all instances) containing landmarks. Subsequently, a deformed image is computed based on the disparity between the instance image and the landmark image. I refer to this image as the “reference” image. As you may infer, the rationale here is that this reference image enables embeddings to be relative to one another.
Attention Localization & Alignment
The FoID (Forensic IDentification) system, inspired by dental practices, uses an attention mechanism to focus on teeth and specific landmarks like the condyle, angle of the mandible, and maxillary sinus. It employs semantic segmentation for teeth localization, utilizing a pre-trained mask-RCNN on the panoramic segmentation dataset. This produces a teeth segmentation map for each input instance. Landmarks are detected using a registration-based method with an atlas, and the atlas is aligned to the instance using Syn registration. The input is then cropped into attention patches, forming the attention stack for representation. Missing anatomical parts are padded with zero-valued patches.
Contrastive Learning with DSA (Domain-Specific Augmentations)
To create meaningful representations from unpaired data, the FoID system utilizes a self-supervised approach with domain-specific augmentations (DSA). Here’s a breakdown of how it works:
- FoID applies a set of augmentations designed to be aware of anatomical structures and mimic forensic scenarios. These augmentations are randomly applied to an instance, generating different views for subsequent reasoning.
- In practice, a mini-batch of stacked attentions from various entities is used. Self-supervised domain-specific augmentations (DSA) are applied multiple times to these attentions. For each iteration, a sample view is paired with another augmented view of itself, creating a positive pair to learn from the same instance. Additionally, a negative pair is formed by pairing Sn with a different view to encourage differentiation between instances. This approach doesn’t require paired instances or identification annotations and can be extended to include such information if available.
- To ensure effective representation learning, the DSA augmentations are designed to account for potential anatomical variations as seen in clinical settings. The proposed DSA includes four types of augmentations inspired by forensic observations, and they are independently applied to each attention patch (Figure 2, d-h and next Figure):
- Random tooth reduction: simulates tooth loss due to injury or other factors.
- Random artifact addition: adds patches of flexible shapes to crown or tooth regions to simulate common dental artifacts like fillings, implants, and braces.
- Random rigid patch transform: applies transformations within a range of angles and displacements. This not only encourages view comparison but also simulates changes in the overall arrangement of teeth, such as those caused by orthodontics.
- Random color disturbance: introduces contrast shifts and Gaussian noise within specified ranges to simulate variations in scanning setups and machine noises.
Self-Attention for Feature Aggregation
To build the representation of an instance in the FoID system, it employs a two-step process:
1. A CNN encoder is used to embed each attention patch from the input. This encoder is designed to extract essential features from an input patch while accounting for potential anatomical variations. It’s adaptable to various convolutional encoders.
2. The generated attention embeddings go through a transformer-based self-attention module, which dynamically determines the importance of each attention patch. The significance of different dental structures is considered, with less attention given to those falling into the general distribution and more attention to those that are more identifying. This importance is determined relative to the experience of forensic dentists, where more distinctive anatomies in an instance are given more attention.
The sequence of attentions is initially projected into a fixed token dimension using a linear encoder. A two-layer BERT encoder is then employed for unidirectional context learning. Notably, no positional information is incorporated into the attention embeddings for two main reasons:
- it is assumed that the semantic category of anatomical structures does not provide a prior for determining importance.
- dental structures are often similar in patterns, and adding positional information can introduce noise. The feature outputs from the final transformer layer are further reduced into a smaller dimension through a non-linear encoder. These outputs are concatenated to form the final embedding for the instance.
Teeth Segmentation
I believed that this blog was incomplete without a vital aspect of the work, which is the segmentation process. In the following section, I will provide a detailed explanation of this crucial component, particularly focusing on the theoretical aspects of the Mask R-CNN architecture (as depicted in Figure 3).
Mask R-CNN introduces a branch of convolutional networks. Following feature extraction from ResNet101, these features are used to construct a feature pyramid network (FPN). This FPN defines anchors and extracts regions of interest (RoIs). These two stages (FPN + anchors) together form the region proposal network (RPN). Subsequently, RoIs are aligned to a consistent size. Finally, each fixed-size feature undergoes the following processes:
- classification as tooth or background
- localization through regression of bounding box coordinates
- per-pixel segmentation through a fully convolutional network (FCN) within each detected tooth bounding box (masks).
Appendix
Apart from the works I mentioned earlier, I’ve also discovered some other models for segmentation (or detection) models, which I have attached here for your reference:
- This Jupyter notebook demonstrates the process of training a teeth segmentation model using the Detectron2 model, which is based on the Meta pretrained model. https://github.com/hedzd/Tooth-Numbering/blob/main/down_numbering_detectron2.ipynb
- Online Tooth Detection Tool: This tool allows you to choose a model from the Detectron Model Zoo, process an input image using that model, and view the result online. https://universe.roboflow.com/university-of-malaya-8ljxr/tooth-detection-and-numbering-detectron2
Closure
We are seeking help to build the software that I have just described theoretically. Please do not hesitate to contact me if you need any further details (by LinkedIn or Email). I hope that by next time, the war will come to an end, and peace will embrace us all.
Reference
- Liang, Yuan, et al. “Exploring forensic dental identification with deep learning.” Advances in Neural Information Processing Systems 34 (2021): 3244–3258.
- Jader, Gil, et al. “Deep instance segmentation of teeth in panoramic X-ray images.” 2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI). IEEE, 2018.