Conversion of RGB images to Hyperspectral using Deep learning

Rishabh Karmakar
Analytics Vidhya
Published in
8 min readJun 16, 2020

Finding an applicable cost-effective way to convert and use hyperspectral images.

Authors: Abhiruchi Bhattacharya, Rishabh Karmakar, Soham Hans, Munagala Naga Govardhan.

Mentored By: Dr. Nidhi Chahal, Dr. Manoj Sharma

RGB to Hyperspectral images

Introduction

Hyperspectral imaging is a method of capturing various wavelengths of electromagnetic rays. The goal of hyperspectral imaging is to obtain the spectrum for each pixel in the image of a scene. Different forms of matter have different spectral properties and capturing these variations gives more information about the nature and qualities of the objects in the image.

Example of multi-channel Hyperspectral image. Image Source: gisresources.com

Hyperspectral images can find applications in various computer vision domains including recognition [1][2][3], tracking[4][5], document analysis, and pedestrian detection[6][7].

Hyperspectral IR Camera. Image Source: telops.com

The hardware necessary for hyperspectral is very expensive and hence only a limited amount of hyperspectral images are available. Producing high spectral resolution also takes longer periods of exposure and therefore cannot capture moving objects.

Proposal

We aim to use deep learning methods to convert normal images captured using RGB cameras directly into hyperspectral images with high speed and low cost. Due to the ease of acquiring RGB images, it becomes easy to produce hyperspectral images for any kind of image analysis.

We develop various deep learning models for the direct conversion of RGB images to hyperspectral. These include techniques such as Convolution Neural Networks, auto-encoder models, and GAN models. We also explain the characteristics and performance of each of these models.

Conversion of RGB Images to Hyperspectral ones

Models

Pix2HS

This is a GAN(Generative Adversarial Network) model based on the Pix2Pix[8] model. The model consists of 2 parts, the generator, and the discriminator.

The generator is composed of a ResNet[9] based model which takes an input 512x512 RGB image and converts it into a hyperspectral image with the same dimensions and 31 channels.

Each of the channels represents the spectral image on a particular wavelength.

Resnet based generator

The discriminator is derived from the PatchGAN[8] model. It takes as input the predicted Hyperspectral image from the generator and the original RGB image as input. The PatchGAN model splits the images into patches and predicts whether each of the patches is real hyperspectral images or fake.

Discriminator architecture

The job of the Generator is to fool the discriminator into thinking its images are real and the job of the discriminator is to make sure it can correctly differentiate between real and fake images. Both models are trained.

Whole model architecture

CycleR GAN

The Cycle Generative Adversarial Network[10][11], or CycleGAN, is an approach to training a deep convolutional neural network for image-to-image translation tasks.

The model architecture consists of two generator models: one generator (Generator-A)[12][13] for generating images for the first domain (RGB) and the second generator (Generator-B) for generating images for the second domain (HyperSpectral)

  • Generator-A to RGB
  • Generator-B to Hyperspectral
Unet Generator

The generator used for this model is based on the U-Net[13] model. The generator models perform image translation, meaning that the image generation process is conditional on an input image, specifically an image from the other domain. Generator-A takes an image from Hyperspectral as input and Generator-B takes an image from RGB as input.

  • Hyperspectral to Generator-A to RGB
  • RGB to Generator-B to Hyperspectral

Each generator has a corresponding discriminator model. The first discriminator model (Discriminator-A) takes real images from RGB and generated images from Generator-A and predicts whether they are real or fake. The second discriminator model (Discriminator-B) takes real images from Hyperspectral and generated images from Generator-B and predicts whether they are real or fake.

  • RGB to Discriminator-A to [Real/Fake]
  • Hyperspectral to Generator-A to Discriminator-A to [Real/Fake]
  • Hyperspectral to Discriminator-B to [Real/Fake]
  • RGB to Generator-B to Discriminator-B to [Real/Fake]

Together, each pair of generator models are trained to better reproduce the source image, referred to as cycle consistency.

  • Hyperspectral to Generator-A to RGB to Generator-B to Hyperspectral
  • RGB to Generator-B to Hyperspectral to Generator-A to RGB
Image Source: TensorFlow/cyclegan documentation

HyperCNN

Convolutional neural networks find widespread applications in image processing and computer vision. CNN’s are effective for hyperspectral recovery[20][27]. Hence, we first consider a five-layer CNN model. The number of feature maps for the first two layers is kept as 32, while for the next two as 64. The final layer produces the 31 channel output image. ReLU activation is used after each layer.

Preprocessing and Hyperparameters: The dataset images were resized to128x128 and normalized between -1 and 1. Kernel size for all layers is 3. Adam optimizer was used for training, with an initial learning rate of 0.0001. The model was trained for a total of 30 epochs with batch size 4.

5 layer CNN

CA-Net

Convolutional autoencoders are networks that typically find applications in noise removal, supersampling, and unsupervised feature extraction[28].

Autoencodernetworks work by extracting (i.e. encoding) a compact representation of the input data and recreating (decoding) the output using that compact representation.

We define two autoencoder-based models, namely CA-Net 5 and CA-Net 10.CA-Net 5 consists of five convolutional and deconvolutional layers, with ReLUactivation after each layer. CA-Net 10 is a deeper version of the former, with ten convolutional and deconvolutional layers respectively.

Pooling and un pooling operations are omitted to preserve details in the hyperspectral image. In both cases, the final layer is modified to return the generated hyperspectral image with 31 channels.

Autoencoder

Results

MAE on all models

The pix2HS GAN achieves a validation set MAE of 0.014. The model also is computationally less expensive due to the fewer layers in the generator when compared to a traditional CNN model.

The CycleR GAN model achieved a training MAE of 0.024 and a validation set MAE of 0.038. The lesser accuracy could be contributed to the conflicting shape of the two data — 3 for RGB to 31 for Hyperspectral.

The HyperCNN model achieved a training loss of 0.015 and a validation loss of 0.021. The training and validation losses stabilized after approximately 20 epochs. The images produced from the validation set retained most details and contained minimal distortion.

The CA-Net 5 achieved a training loss of 0.0164 and a validation loss of 0.018, performing the best among the convolutional models considered. Overall the model tends to produce slightly darker images for higher channels, and slight aberrations can be seen for some edges.

The CA-Net obtained a training loss of 0.023 and a validation loss of 0.029.

Outputs

Conclusion

We have presented multiple models to convert RGB image inputs in 3 channels to 31 spectral channels with high accuracy. We have shown that our models can accurately and efficiently develop hyperspectral images from RGB data. Our methods also have very little run time and computational complexity. We have shown a variety of approaches and the pros and cons of each.

Future work can be done to further increase the accuracy of hyperspectral imaging so that these methods can truly replace all hyperspectral cameras and revolutionize the field of computer vision and analysis from images. Converting RGB images to IR can be another domain in which this research can be useful for future reference.

Some outputs

References

[1]: R. M. Nguyen, D. K. Prasad, and M. S. Brown. Training- based spectral reconstruction from a single rgb image. In European Conference on Computer Vision, pages 186–201. Springer, 2014
[2]: M. Uzair, A. Mahmood, and A. S. Mian. Hyperspectral face recognition using 3d-dct and partial least squares. In BMVC, 2013.
[3]: M. Uzair, A. Mahmood, and A. Mian. Hyperspectral face recognition with spatiospectral information fusion and pls regression. IEEE Transactions on Image Processing, 24(3):1127–1137, 2015
[4]: D. Zhang, W. Zuo, and F. Yue. A comparative study of palmprint recognition algorithms.ACM computing surveys(CSUR), 44(1):2, 2012.16.
[5]: H. Van Nguyen, A. Banerjee, and R. Chellappa. Track- ing via object reflectance using a hyperspectral video camera. In Computer Vision and Pattern RecognitionWork- shops (CVPRW), 2010 IEEE Computer Society Conference on, pages 44–51.IEEE, 2010.
[6]: S. J. Kim, F. Deng, and M. S. Brown. Visual enhancement of old documents with hyperspectral imaging. Pattern Recognition, 44(7):1461–1469, 2011.
[7]: S. Hwang, J. Park, N. Kim, Y. Choi, and I. S. Kweon. Multispectral pedestrian detection: Benchmark dataset and baseline. Integrated Comput. Aided Eng,20:347–360, 2013.
[8]: Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros. Image-to-Image Translation with Conditional Adversarial Networks.
arXiv:1611.07004
[9]: Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Deep Residual Learning for Image Recognition. arXiv:1512.03385
[10]: I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio. Generative Adversarial Networks. CoRR, 2014. 3, 6
[11]: Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.
arXiv:1703.10593
[12]: T. Stiebel, S. Koppers, P. Seltsam, and D. Merhof. Reconstructing spectral images from rgb-images using a convolutional neural network. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2018. 3, 4, 6
[13]: Ronneberger, Olaf & Fischer, Philipp & Brox, Thomas. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. LNCS. 9351. 234–241. 10.1007/978–3–319–24574–4–28.

--

--

Rishabh Karmakar
Analytics Vidhya

Dabble in Data Science and AI. Enthusiast in Web Development, Android and Ethical Hacking. Lover of Mythology, Philosophy and Psychology.