Panoramic x-ray dataset augmentation using generative AI

Published in

about ai

9 min readOct 29, 2023

Panoramic x-ray sample from Data source [10].

Deep learning has proven to be an effective method to improve condition identification in healthcare using x-ray images [1]. However, developing deep learning models can be difficult because accessing appropriate training datasets can be complicated due to high costs of expert labeling or privacy restrictions due sensitive patient information [2]. Generative AI has the potential to overcome these limitations by producing synthetic medical images that closely resemble real patient data [3]. In this post I review some generative AI models that have been used to generate images and explain how one of them, diffusion models, can augment an x-ray dataset using Medical Open Network for Artificial Intelligence (MONAI) platform.

What are generative AI models?

Generative AI models are a category of artificial intelligence models that are designed to generate new data samples, such as images, text, audio, or other types of content, that are similar to or indistinguishable from data samples in a given dataset. Generative AI models are typically trained on large datasets to learn patterns, structures, and features present in the data, enabling them to create new, original data samples that resemble the training data. Some examples of generative AI models applied to images are:

1. Variational Autoencoders (VAEs): VAEs are probabilistic generative models that learn a latent data representation. They can generate new images by sampling from the learned latent space. VAEs are known for their ability to generate diverse images but may lack some fine-grained control.

2. Generative Adversarial Networks (GANs): GANs consist of a generator and a discriminator network that compete in a game-like setting. The generator tries to produce realistic images, while the discriminator tries to distinguish between real and generated images.

3. PixelCNN and PixelRNN: PixelCNN and PixelRNN models generate images pixel by pixel, capturing dependencies between pixels. They are autoregressive models that consider the entire image context while generating the next pixel.

4. Flow-Based Models: Flow-based models are designed to model complex probability distributions. They use invertible transformations to map a simple distribution to a complex data distribution.

5. AutoRegressive Transformers: Models like OpenAI’s GPT-2 and its variants, originally designed for natural language processing, have been adapted for image generation. Instead of predicting the next token given a context, they predict the next pixel in an image.

6. Diffusion Models: These models use iterative diffusion transformations to generate data samples. They have gained popularity for their ability to generate high-quality and diverse images.

Below, I will expand on this type of model and at the end I will show an example about how to generate dental panoramic x-rays using MONAI.

What are Diffusion models?

Diffusion models are a class of machine learning models used for generative tasks, particularly in the context of generating realistic images, audio, or other data samples. They are named after the gas diffusion process concept in physics [4,5].

The basic idea for these models is to learn to gradually “diffuse” an image (x0) into noise (xT) by applying simple transformations (backward transformation). The trick is that by learning these simple transformations, the opposite (forward transformation) transformation can be learned too, meaning that it is possible to gradually transform noise (xT) into an image (x0) (Figure 1).

**Figure 1.** Diffusion process as a Markov chain. Adapted from https://arxiv.org/pdf/2006.11239.pdf

More formally, a diffusion probabilistic model can be expressed as a parameterized Markov chain trained using variational inference to produce samples that resemble the original data. “When the diffusion consists of small amounts of Gaussian noise, it is sufficient to set the sampling chain transitions to conditional Gaussians too, allowing for a particularly simple neural network parameterization”.

Once the network (model) has learned to transform the distribution where the noise comes from, this model can be used to iteratively “diffuse” the samples of this distribution into samples that resemble real data.

Why use diffusion models to augment image datasets?

Traditionally, data augmentation involves simple operations like rotations, flips, and brightness adjustments. While these techniques are useful, they can only take you so far in diversifying your dataset. Diffusion models, on the other hand, bring a whole new level of sophistication.

The basic idea is the following:

1. Start with your existing dataset of images.

2. Use a diffusion model to generate new images that are variations of your original data.

3. These new images aren’t just pixel-level tweaks; they’re entirely novel creations that potentially extend the characteristics of your dataset in a controlled manner.

Benefits of diffusion-based augmentation in medical image datasets:

Realism: The generated images look remarkably real, adding authenticity to your dataset.
Diversity: You can create an almost endless variety of images, helping your model generalize better.
Hard-to-Find Data: Diffusion models can fill in the gaps for rare or hard-to-find examples.
Imbalance: Address class imbalance issues by generating augmented samples for underrepresented classes or rare medical conditions.

Considerations and limitations in augmenting medical image datasets

Data Privacy and Security: Maintain data privacy and comply with healthcare regulations. Ensure that generated images do not contain any patient-specific information.
Clinical validation: Collaborate with medical experts to validate the augmented dataset and assess the model performance in real clinical scenarios.
Ethical Considerations: Be mindful of ethical considerations when working with medical data, ensuring that the use of augmented data aligns with ethical guidelines and regulations.
In most cases, a dataset required to train a good diffusion model might be large. Therefore, a common criticism to this approach is that you need a large dataset to start with. If you already have it, why do you need to augment it.

Latent diffusion for panoramic x-ray augmentation using MONAI

MONAI (Medical Open Network for Artificial Intelligence) is an open-source library to apply AI to medical imaging. I am going to use the Generative Models extension to generate x-rays. This part is inspired by this great post by Walter Hugo Lopez Pinaya.

Figure2. Example of “diffusing” an x-ray into noise (red arrows) and also the opposite operation in which random pixels (noise) can be transformed into an x-ray (green arrows) iteratively.

The general idea of diffusion models is to diffuse an image (or transform samples of the distribution of x-ray images) into samples of the distribution of noise pixels and then use a DNN to learn the opposite operation (Figure 1-2). Basically, this reverse operation consists of predicting the noise in the image so then, you can subtract it iteratively. Now, the idea to use such models to generate novel x-raysl is that, once we have a model that can transform random noise into x-rays, we sample this distribution to produce new x-ray images.

However, training such models to transform samples of the distribution of x-rays might be too complex in terms of memory and the required network architecture, so instead, we can train a model to do these iterative transformations on the distribution of a latent space instead of the original pixel space. This lower-dimension space which might be smaller and easier for the transformations. To find a latent space for this, we can train an autoencoder. The intuitive idea is to train a network that can represent or “encode” the original image x into a smaller dimension space (latent space) such that it can be then “decoded” into the an image x’ that is as close as possible to the original image (Figure 3).

**Figure 3. Illustration of an autoencoder neural network.** This model is trained to “compress” an image into a smaller representation (latent space) and “decompress” such representation into a reconstructed image that closely resembles the original x-ray.

Once we have an autoencoder to map x-rays into the latent space, we need to train a model to diffuse and to “denoise” (or iteratively transform noise) into an x-ray. You can see this notebook for an example of how to train an autoencoder using MONAI for a small public dataset of panoramic x-rays [10] that you can run on Google colab.

**Figure 4**. Example of generated panoramic x-rays sampling the latent space of a variational autoencoder implemented in MONAI.

Latent Diffusion model

Back to the latent diffusion model, as we can see above, we can just sample the latent space generated by the autoencoder, why do we need anything else? The idea has to do with the generative part, in the above example we just sampled the latent space in an arbitrary way to “decode” the samples into image space. When we want to generate novel images, we want to do it in a controlled manner. For example, by providing instructions in text like “generate panoramic xrays with implants”. Diffusion models are a popular solution to this problem that produce good results and have desirable properties such as they are easier to train and scale than other options like GANs and VAEs [5–7]. For a good explanation of the math behind diffusion models I recommend this video.

Now let’s understand a bit more the idea proposed for latent diffusion models for controlled image synthetis. As we mentioned above, instead of working in the image space, this approach uses a latent space that is less complex [7–8] and therefore easier to work on. For this, as shown above, a Variational Autoencoder is used and for the reverse process (to remove noise and produce an image), a U-net is used. For adding the control over the generated images, this approach uses a conditioning input y (for example in text) that is mapped to the latent space that is used to sample it in a controlled manner [7].

**Figure 4.** Diagram adapted from the High-Resolution Image Synthesis with Latent Diffusion Models [paper]. For training, the red is the forward process of diffusing images x into noise and on green is the denoising (reverse) process in which a U-net is trained to remove noise and produce a similar image x’. For generation, a condition y (e.g. text) is mapped into the latent space which in turn is denoised into the novel generated image.

You can check out this tutorial to generate brain images using a latent diffusion model implemented in MONAI. Soon I will implement a similar approach using panoramic xrays.

Conclusions

Generative AI is a young but very promising area of AI that is evolving very rapidly. The ideas and techniques presented in this post have potential to revolutionize how AI is applied to medical imaging and therefore, create a great impact in the diagnostic processes in healthcare.

Future work

I plan to extend this work by training the diffusion model to generate images with different levels of noise, artifacts, and imaging conditions to mimic real-world situations to be able to train diagnostic models and make them more robust.

Another extension of this work could be to leverage the diversity of diffusion-based augmentation to simulate different patient scenarios, including variations in disease progression or biological factors like age. This addition could have great potential in developing more powerful diagnostic and prognostic models in the future.

References

[1] Litjens, G., Kooi, T., Ehteshami Bejnordi, B., Adiyoso Setio, AA., Ciompi, F., Ghafoorian, M., van der Laak, J.A.W.M., van Ginneken, B., Sánchez, CI. (2017) A survey on deep learning in medical image analysis. Medical Image Analysis.
[2] Masayuki Tsuneki (2022). Deep learning models in medical image analysis. Journal of Oral Biosciences.
[3] Koohi-Moghadam, M., Bae, K.T. (2023) Generative AI in Medical Imaging: Applications, Challenges, and Ethics. J Med Syst 47, 94.
[4] Sohl-Dickstein, J., Weiss, E. A., Maheswaranathan, N., & Ganguli, S. (2015). Deep Unsupervised Learning using Nonequilibrium Thermodynamics (arXiv:1503.03585). arXiv. http://arxiv.org/abs/1503.03585
[5] Ho, J., Jain, A., & Abbeel, P. (2020). Denoising Diffusion Probabilistic Models (arXiv:2006.11239). arXiv. http://arxiv.org/abs/2006.11239
[6] Dhariwal, P., & Nichol, A. (2021). Diffusion Models Beat GANs on Image Synthesis (arXiv:2105.05233). arXiv. http://arxiv.org/abs/2105.05233
[7] Rombach, R. , Blattmann, A. Lorenz, D., Esser, P., Ommer, B. (2022) High-Resolution Image Synthesis with Latent Diffusion Models. CVPR 2022 [PDF]
[8] Zhang, L., Rao, A., Agrawala, M. (2023)Adding Conditional Control to Text-to-Image Diffusion Models. arXiv [PDF]
[9] Lopez Pinaya, WH., Tudosiu, PD., Dafflon, J., Da Costa, PF., Fernandez, V., Nachev, P., Ourselin, S., Cardoso, JM. (2022) Brain Imaging Generation with Latent Diffusion Models. ArXiv [PDF].
[10] Abdi, A. H., Kasaei, S., and Mehdizadeh, M. (2015) Automatic segmentation of mandible in panoramic x-ray. J. Med. Imaging.