PCA for Hyperspectral Imaging

Published in

Polimi Data Scientists

7 min readMay 14, 2020

Introduction

All of us have experience with digital photos. We shoot them every day by just tapping on our smartphone screens, and we immediately get a stunning, colorful image, ready to be shared or posted online. The process by which a phone or a camera captures a colored picture is all but trivial, and it is beyond the scope of this article. But.. have you ever wondered about how the color is encoded and reproduced on a screen? How can the camera sensor recognize the colors? What are colors? What we call color is the electromagnetic field in one of its infinite configurations, reaching our eye (and most importantly our brain, that actually sees). In fact, the electromagnetic field can manifest itself with various (infinite actually) wavelengths, that determine its properties.

The sounds we hear on the radio when driving in our car comes from an electromagnetic wave whose wavelength is tens of meters long. The wave is then transduced in sound by the technology of the radio and reach our ears. The microwave oven that’s now heating up my lunch box is exploiting radiation of the height of a few meter. The red t-shirt of my colleague is reflecting light at 800 nanometers, while the image of a broken bone is realized very short wavelengths, x rays.

In a word, the electromagnetic field is everywhere. And the colors? Well, in a sense colors may not exists, since they are exactly the same thing as radio waves or microwaves. But nevertheless, we see them. So, in the end they are “produced” by our brain. However, let us avoid too deep philosophical discussions. The point to be clear for our discussion is the correspondence between color and wavelengths.

What our cameras do is to take three different images of the same scene. Each image carries information about a specific color, and it is produced removing (with some suitable special filters) the unwanted wavelengths. In this way only the energy related to a specific color excites the sensor pixels, each of them recording the intensity of the light, in a given point of the scene we’re looking at. The next question is, which are the three selected colors by the sensor? Well, they are red, green, and blue. Sounds familiar? Have you ever met RGB? Without being too technical, the point is the following. Red, blue and green define a color space that well approximates the human color space, and it is efficient in terms of digital representation. This is not the only one possible, of course.

Let’s now get closer to the main topic of this article. Light bouncing back or passing through an object carries important structural information about the sample properties. However, conventional photography due to the RGB reconstruction we discussed so far, is insensitive to the largest part of this information amount. Why so? Because of the filters: they allow to pass only the selected wavelengths necessary for visualization, removing all the others.

There are a lot of applications that exploit chemical information from samples or objects especially when it is necessary or not possible to touch them. This is the field of remote sensing, where you gather structural information about an object, staying far apart from it. Hyperspectral imaging is an optical technique that deals with realizing pictures with a higher number of colors than RGB images. The game is to produce a picture for each wavelength.

How does a hyperspectral image look like? Well, from a digital standpoint it is exactly like a conventional picture, but bigger. In fact, RGB images are composed of three bidimensional arrays, one for each color channel. In a hyperspectral image the number of color channels is higher. How much? It depends on the technique used to acquire the image. Let us consider this example: we have a camera whose sensor has 1280x1024 pixels, and a dynamic range of 10 bits. Each spectrum is instead sampled with 500 points. Therefore, the total information amount for the whole hypercube is 6553600000 bits, corresponding approximately to 800 Mb. This value can increase whether a higher spectral resolution is required (i.e. spectral points are measured with less spacing between each other).

PCA for dimensionality reduction

Here is where Data Science and Machine Learning come inside our discussion. The goal of the analysis of a hyperspectral image is to reduce the large amount of data collected during a measurement to a bunch of components that describe sufficiently well the spectral and spatial characteristics of the imaged sample, in an unbiased and comprehensive manner.

Principal Component Analysis (PCA) is a basic unsupervised technique that aims to obtain a mapping from a higher dimensional space to a lower one. Let’s start by considering the space where data live. In the case of a hyperspectral imaging, each datapoint is represented by the spectrum of a pixel, sampled at, say, 500 wavelengths. Therefore, each datapoint is composed of 500 numbers. Since each wavelength is independent from all the others . Our dataset thus lives in a 500-dimensional space, quite large indeed. Now, the question is: can we find a more convenient description for our data? Convenient, in what sense? Well, a good point is, for instance, a description that requires less than 500 values for each datapoint. Can we represent our points in a smaller space? The trivial answer could be to sample fewer wavelengths, but then we fall again in the RGB-like case. No, we want to compress our dataset, limiting as much as possible information loss.

Let us consider a simple example in which we have only two colors x and y. Each point is represented by the pair (x,y). Looking at the image, we see that there is a direction v (red line) such that, by projecting each yellow point onto that, we can represent the data in terms of v, using a single value, the coordinate along this direction, instead of two. The key point is that we are not selecting x or y (that would correspond to sample fewer wavelengths), but a combination of the two. And not any combination, but the best one. In this case we are considering “a bit more” of x and “a bit less” of y. The combination is encoded in the direction of v (bottom line).

This is the aim of PCA, find the best mixing of features, such that the original information is preserved as much as possible. In other words, PCA computes the direction along which data express the largest variation. The direction is not unique, namely PCA returns more than one vector v, sorted by descending importance; by importance we mean the capability of describing the difference among data points. Mathematically, the vectors v are the eigenvectors of the covariance matrix of the data.

Does it really work?

Let us now see an example considering a hyperspectral image of a colored glass window of a church in Milan. On the left you can see (a) the reconstructed RGB image, as well as (b), (c) and (d) the first, second and third components detected by PCA. Each component is made up with a specific combination of weighted wavelengths, reported below. These curves do not represent any physical object, nor they have some relation to chemistry (as usually happens with spectra in optics). They are just directions along the spectral space along which data express the largest variation.

The first component is somehow related to the whole spectrum and reports on the variation of light intensity. The second component is more informative. Look at the blue glass pieces in the RGB image. Almost all of them show high intensity (they are more whitish), with one exception, highlighted with the yellow circle in the RGB image. Even though we see it as blue, its nature is indeed different from all the other blue objects. That’s why along the second component direction it shows a lower intensity. Probably this piece was replaced.

By looking at the real spectra for the replaced square and the one above it, we can see that they are indeed different, reinforcing the hypothesis that the two materials are different.

Finally, which spectra belong to which piece of glass? You see that the second component is giving weight to the wavelengths around 800 nm. So those pixels that have high spectral intensity at 800 nm, have higher intensity with respect to the second component. Therefore, you realize that the spectrum which is flat in this region (yellow line) belongs to the glass that was probably replaced.

Conclusions

Hyperspectral imaging is a rapid growing field in Optics, in which many Machine Learning procedures can be applied to compress and classify the data.

We showed a real use case of Principal Component Analysis enabling the detection of anomalies in a group of objects that apparently were similar. We had a glimpse about the versatility of PCA that can be used for data compression, clustering and visualization.

This blogpost is published by the PoliMi Data Scientists association. We are a community of students of Politecnico di Milano that organizes events and write resources on Data Science and Machine Learning topics.

If you have suggestions or you want to come in contact with us, you can write to us on our Facebook page.

PCA for Hyperspectral Imaging

Introduction

PCA for dimensionality reduction

Does it really work?

Conclusions

Written by Carlo Valensise