Finding an optimal number of “K” classes for unsupervised classification on Remote Sensing Data

Toby Zaw
3 min readJan 2, 2022

--

We all have learnt that there are two types of classification namely Supervised classification and unsupervised classification. In the world of remote sensing, types of landcover varies across geography and therefore it is always hard to predict a number of landcover classes exit within an area.

To perform classification regardless of it’s type, we have always known to be knowledge on the number of classes. For instance, in performing supervised classification, we not only know the instance of classes, we but also can assign a particular composite of satellite bands belonging to one particularity. Likewise, in unsupervised classification, we also tend to know the number of classes by assuming as we scan through the whole scene. In both cases, we only have slight chances of knowing the “number of classes” or “K”.

In my case of trying to find the optimal number of “K” in our dataset, the elbow method is used. It is stated as

“A fundamental step for any unsupervised algorithm is to determine the optimal number of clusters into which the data may be clustered. The Elbow Method is one of the most popular methods to determine this optimal value of k.”

I am using random satellite images I found on internet to identify it’s cluster as show below.

I then read the simple image using PIL library to open the image and read it as numpy array. Once the image is read, pyrsgis module is used to transform it’s axis to categorize into respective RGB sets of values.

I then again set the list of K values for my areas of interest supposing from 1 to 15. The distortion and inertia values were calculated iteratively using sklearn module.

Distortion and inertia values for each classes

As shown in the figure, in determining the optimal number of clusters, we have to select the value of k at the “elbow” ie the point after which the distortion/inertia start decreasing in a linear fashion. Thus for the given data, we conclude that the optimal number of clusters for the data is 3.

Once the optimal number is selected , we used Kmeans from sklearn to do unsupervised classification and the results can be seen as below.

Reference : https://predictivehacks.com/k-means-elbow-method-code-for-python/

Codes can be found here

Any questions can be reached by email or linkedin!

--

--

Toby Zaw

Remote Sensing Scientist / Geo-data science devotee / MSc Student