Generating brain CT images using Disentangled Variational Autoencoders

Published in

nico.lab

7 min readJan 22, 2021

This is the first post written during our research sessions at NICO.LAB.

Can we generate CT slices of a brain? Or interpolate between two different brains? Or… if we represent the brain image as a set of numbers, can we change brain size, rotation, anatomy just by tweaking these numbers?

I asked these questions when I first read the paper about Disentangled Variational Autoencoders (β-VAE).

According to the paper, you can encode an image into a small numerical vector in a way that each vector’s variable will be responsible for one independent and interpretable visual feature. For faces, it can be skin color, age, gender, or image saturation. What these features will be for brain images? Let’s found out.

The impact of latent representation units on the visual features of the image (skin color, age/gender, image saturation). Source: β-VAE paper

Variational Autoencoder

Before diving into Disentangled Variational Autoencoders (β-VAEs) let’s take a look at usual autoencoders. The mechanism behind autoencoders is simple — you train a network by compressing high dimensional data (encoder part) into low dimensional representation space and then reconstructing it back into original high dimensional data (decoder part). The objective is to reconstruct the low dimensional vector to the original data as close as possible. The loss used to train such a network is called reconstruction loss — it shows how different the reconstructed data is from the original data.

Autoencoder. Source: vae-explained blog post

Low dimensional vectors (or latent representations) of autoencoders, however, don’t make that much sense. If you generate a random latent representation vector and reconstruct it into an image you would get a random image back.

However, if you use Variational Autoencoders (VAEs) to reconstruct a random latent representation vector, the reconstructed images would make sense. VAEs can be used to generate images!

The idea behind VAE is that the encoder produces the latent representation not as a single vector, but as a probability distribution over vectors. Then, we can sample latent representation from this distribution and reconstruct the image.

Variational Autoencoder (VAE). Source: vae-explained blog post

However, ‘sampling operation’ during training is not differentiable and we can’t backpropagate the gradients. To overcome this limitation, the reparameterization trick is applied: the encoder produces two vectors: a mean vector μ and a log of variance vector Σ. Then some random vector ε is sampled from unit Gaussian distribution. Latent representation z in this case:

It is used to reconstruct the image. Backpropagation is used then to update μ and Σ weight vectors.

To ensure that the probability distribution produced by the encoder is close to the unit Gaussian distribution, KL divergence between these two distributions is added to the reconstruction loss.

Disentangled Variational Autoencoder

If we generate a random latent representation vector and change only one component and reconstruct it to the image we will see some changes. However, in most cases, these changes wouldn’t be interpretable. If we reconstruct images of hand-written digits, for example, we can see that both size, rotation, and shape are changing at the same time. It means that the latent representation is entangled and its components are not independent.

In VAE the loss consists of two parts: reconstruction loss that makes sure that the reconstructed image is close to the original image and KL-divergence that makes sure that the probability distribution that the encoder produces is close to unit Gaussian distribution.

To overcome the issue with entangled representation and to get more interpretable reconstructions the Disentangled VAE or β-VAE was proposed.

In β-VAE, KL divergence is up weighted with β parameter to enforce the independence of latent representation components:

The bigger β is, the more disentangled the visual features should be.

Latent representation units are tweaked one by one. In the case of β-VAE we can see that either size or rotation or angle is changing. In the case of VAE multiple features are changing together. Source: β-VAE paper

Implementation details

To implement VAE we used an encoder with 10 convolutional layers and a decoder with 7 convolutional layers, each layer followed by transposed convolution. We also used batch normalization and Leaky ReLU activation function. The implementation is described in detail in this medium post.

To implement β-VAE we just added β parameter to KL divergence part in the loss function.

To train β-VAE we used CTA (Computed Tomography Angiography) brain volumes of 350 patients. Slices were taken from each CTA resulting in 9910 images. We downsampled each image from 512x512 to 256x256 to simplify the data.

We split the data into a training set (80%) and a validation set (20%) and trained for 14 epochs (which was enough for convergence).

The latent representation size parameter

In the first experiment, we can see the impact of latent representation size on the validation loss. The latent vector sizes in this experiment were [10, 20, 50, 100, 500]. The β value = 1 (traditional VAE).

The worst performance is reached with the latent representation size of 500 (too many units for the little training data we have) and of 10 (the representation is not powerful enough to represent our data).

The latent vectors of 20, 50, and 100 units showed the same performance.

In the plots below the first number shows the latent vector size and the second number shows β value. Don’t be surprised by the funny experiment names — they were generated automatically (using Weights and Biases).

Bellow, we can see Brain CT slices generated using β-VAE with latent representation sizes of 10, 50, and 100. With too few latent representation units (10) the generated images are too round and simple. With too many units (500) the generated images are torn apart. Images, generated from 50 units are looking better: less round and not torn apart.

Brain CT slices generated by β-VAE. Latent representation size = 10

Brain CT slices generated by β-VAE. Latent representation size = 50

Brain CT slices generated by β-VAE. Latent representation size = 500

β parameter

In the next experiment, we can see the impact of β on validation loss. The latent vector size is 50 and β parameter is [1, 20, 50, 200]. The bigger β is, the stronger is regularization, and the higher the loss is, as expected.
Though the reconstruction quality is getting worse with bigger β (it correlates with the validation loss), the disentanglement and interpretability of latent representation variables should become better (see next experiment).

Visualizations of disentanglement

β parameter in this experiment was [1, 50, 200], the latent representation vector size = 50.

Images were downsampled to fit in the grid.

The middle column in the pictures below represents the reconstruction of the original brain slice image. If we decrease/increase the first unit of latent representation by a little value we will see some change in the reconstruction (first row, right/left from the middle column). Other rows represent the other four latent representation units. We could show all 50 units, of course, but the grid would be too big and difficult to interpret.

*Changes in the reconstructed image due to changes in the first five latent representation units. β* = 1 (traditional VAE). We can see only a little, entangled image changes due to changes in latent representation units. It makes sense as β=1 and no additional regularization on KL divergence is used: we are implementing traditional VAE here.

*Changes in the reconstructed image due to changes in the first five latent representation units*. β=50. β parameter is increased and we can see that the changes are more visible and independent. For example, units 1 and 5 are representing rotation. Not only rotation though, but intensity also changes.

*Changes in the reconstructed image due to changes in the first five latent representation units. β*=200. Changes now are even more visible. Unit 1 shows the rotation (mostly), unit 2 shows scrolling through slices. However, the bigger β is, the worse reconstruction quality is getting. We can see it very well here: images are torn apart and don’t look real. So there is a tradeoff between disentanglement and reconstruction quality.

Visualizations of interpolation

In this experiment, we will see how the interpolation between slices of brain images from different people can be made through interpolation between the latent representation of these slices.

For the images below the model with β of 50 and latent representation size of 50 was used.

Left to right: The first image is the reconstruction of the brain slice image of one person, the last image is the reconstruction of the brain slice image of another person. Images in between are coming from interpolation using *β-VAE.*

Future work

We had to downsample the images from 512x512 to 256x256 pixels to simplify the data. However, if we use more training data, it makes sense to keep the original image size. Maybe, in this case, the reconstructed images will be less blurry.

It would be also interesting to use a more objective metric to estimate the “importance” of a particular latent represantation unit instead of manual visual inspection. The “disentanglement metric”, proposed in β-VAE paper can be used for that purpose.

Conclusion

The generated images, produced by VAE and β-VAE are too blurry to be used (for data augmentation, for example). Was it all useless then? Not quite.

The fact that you can encode a brain slice in just 10 (or 50) numbers is fascinating. The fact that you can modify these numbers to change separate properties of the brain and interpret the changes is fascinating too.

Moreover, it was so much fun to run those experiments!

VAE original paper:

Kingma, Diederik P., et al. “Auto-encoding variational bayes.” arXiv preprint arXiv:1312.6114 (2013). https://arxiv.org/pdf/1312.6114.pdf

Beta-VAE paper:

Higgins, Irina, et al. “β-VAE: Learning basic visual concepts with a constrained variational framework.” (2016). https://openreview.net/references/pdf?id=Sy2fzU9gl

VAE explained blog post:

https://keitakurita.wordpress.com/2017/12/19/an-intuitive-explanation-of-variational-autoencoders/

Abdominal image synthesis with VAE tutorial:

https://medium.com/miccai-educational-initiative/tutorial-abdominal-ct-image-synthesis-with-variational-autoencoders-using-pytorch-933c29bb1c90