Spectral image classification with Python

Published in

Abraia

7 min readMar 29, 2021

In this short article, we’ll see how to easily train and apply an image segmentation classifier to a hyperspectral imaging problem without installing any software. All the steps can be performed with this simple python notebook.

In a previous post, we’ve seen how to efficiently perform a hyperspectral image analysis using a python notebook to manipulate and visualize HSI cubes stored in the cloud. Now, we’ll see how to use annotated data to create a prediction model by training a classifier.

Dataset

As an example, we’ll use the Indian Pines (IP) Hyperspectral Image Dataset, widely used as a benchmark by research studies in the field. It was gathered using the AVIRIS sensor over the Indian Pines test site in North-western Indiana and it consists of 145x145 pixels and 200 spectral bands. The next figure shows some random bands.

The dataset comes with annotations corresponding to 16 different categories related to the use of the land. The next figure shows an image with areas of these categories depicted in different colors.

ground truth mask showing the areas of the annotated categories

The following table shows the names and number of samples of each category.

As we can see, the number of samples from one category to another is not balanced. Some categories have a high number of samples while others have very few. This is a frequent issue with datasets.

We add one more label for unannotated pixels that don’t belong to any category (black area in the image).

Here we’ll consider the dataset as an array of samples X, each sample representing a pixel by a vector of 200 components, corresponding to the spectral responses at that point. As well, we’ll have an array of annotations y with a scalar value from 0 to 16, providing the category of each pixel, as defined in the ground truth.

Redundancy analysis with PCA

Most surfaces show reflective properties that are smooth with wavelength. With 200 spectral bands measured in a packed spectral range within the visible and near-infrared, in the IP dataset we should expect strong correlations across bands that are close in the spectrum, that is with close wavelength values.

So a good starting point is to visualize the redundancy in the data available (our X vector) using PCA decomposition.

pca = PCA(); pcX = pca.fit_transform(X)
ev=pca.explained_variance_ratio_ ; cumulativeVar = np.cumsum(ev)
plt.plot(cumulativeVar)

curve of mulative variance ploted against the number of components extracted

The curve shows that practically all the variance in the IP dataset is explained with the first 40 principal components. The rest of them are likely to contain mostly noise.

images of first eight principal components

We may also visualize the first principal components derived from the original spectral bands.

As we would expect, components accounting for lower and lower variance values appear more and more noisy.

Model training and testing

Preparing data and preprocessing

To train the model we should split the data available into a training set and a validation set. This is a must to get a reliable assessment of the performance of our classifier and to spot any overfitting issues. Splitting the dataset is as easy as

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=11, stratify=y)

the variable random_state sets the seed that randomly splits the data. While a ramdon sampling minimises the risk of bias, being able to set the seed means that we can repeat the experiment with the same training and testing sets. This facilitates reproducibility.

Here, we’ll use a classic SVM as classifier. While SVMs yield lower accuracy and sensitivity values than state-of-the-art CNNs (in the case of the IP dataset, a drop of about 10%), they’re great for prospective exploration. SVM’s are simple and light and still provide valuable insights on the feasibility to automate a classification problem, even with modest amounts of annotated data.

When we use a classic classifier like SVM and once we split the data, a good practice is to decorrelate our features. In our case, the features are the spectral bands.

Moreover, as variance can be explained by the first components, we may learn a PCA decomposition with a reduced set of spectrally decorrelated components. For instance, the first half of the principal components. We should learn this decomposition early on, using the training set (before training our clasifier). This operation is quite simple.

nComp = 100
pca = PCA(n_components=nComp)
pca.fit(X_train)
pcX_train = pca.transform(X_train)

As well, we should store the corresponding transformation to be later applied as a preprocessing step, in the prediction phase.

By reducing redundancy, we get rid of noise and irrelevant information that won’t improve the performance of our classifier.

But even if we don’t want to apply a reduction of the number of components, decorrelating the bands before feeding the classifier is a good practice when we use a classic approach like SVM. It may easily improve classification performace by a 10%, compared to directly pass raw -strongly correlated- components.

Training our SVM model

Once data and preprocessing are prepared, training an SVM is really simple. We only need to feed the model with our training data conveniently preprocessed.

svm.fit(pcX_train, y_train)

Evaluating the classifier

Once trained, we may run the model on the validation set, containing data that were not provided to the classifier in the training stage.

pcX_test = pca.transform(X_test)
y_pred = svm.predict(pcX_test)

We can assess performance with well established tools and metrics. A nice example is the confusion matrix.

disp = plot_confusion_matrix(svm, pcX_test, y_test, display_labels=class_names)

It shows how many samples of each category have been correctly classified and how many have been wrongly attributed to one of the other categories.

In our case, we can see that most of the errors arise from the confusion between unannotated pixels and woods, with 99 wood pixels undetected and 63 unannotated pixels wrongly classified as woods. This is easy to understand, as woods and unannotated categories both represent a large share of pixels. But we can also see that for the buildings-grass-trees-drives category most of the pixels go undetected. On the opposite, wheat or corn-notill pixels are correctly classified as such in a large proportion.

We’d also appreciate a numeric metric that tells us how the model performs. In this regard, precision, recall, and f-score are really handy. Precision tells us about the proportion of samples correctly classified under a given category. Recall probes the sensitivy of our model, that is the proportion of instances of a given category that has been detected. Finally, the f-score provides a balanced estimate of both precision and sensitivity.

We may get a classification report with these metrics calculated for each of the categories, as well as their average values over the whole set, simply calling the classification_report function.

classification_report(y_test, y_pred, target_names=class_names)

Which returns the following text that we may print or store.

table showing precision, recall, and f-score values for each of the categories as well as the average values

These metrics tell us that our model yields the worst and best classification scores for categories with the lowest support, with almost no samples left for validation. This points to a lack of statistical significance for these cases.

Among categories with a number of validation support samples over 20, we find that the buildings-grass-trees-dives gets the lowest classification performance while the wheat category gets the highest. This is in agreement with what we could expect from inspection of the confusion matrix. A possible explanation is that at the scale considered, wheat surfaces are more regular and spectrally distinct than woods ones, so they’re easier to learn for the classifier.

We may also get a more graphic idea of performance by comparing the image segmented by the model with the image depicting the human annotated ground truth.

comparison of ground truth image with human annotations and the image automatically annotated using the classifier

As we can see, a simple machine learning approach using a SVM classifier that is trained with the spectral composition of each pixel brings an acceptable predictive capability for the categories proposed. After looking at this initial results, it’s reasonable to expect that this hyperspectral imaging use case is ripe for automated classification. We can also expect to achieve better performance by modelling also texture, rather than just spectral features as we did.

After a previous post on basic hyperspectral image analysis, we have dived a bit deeper in the problem. We have seen how to easily evaluate a hyperspectral image classification scenario using a Python notebook in combination with a cloud storage and management of our HSI data.