AI Application to Demonstrate K-Means Clustering Using H2O Wave

Shamildilshan
4 min readFeb 20, 2022

--

In this blog, I am going to highlight how cool the H2O wave is, by demonstrating my application called “K means App” which was built using Wave 0.20.0. This is a simple application I have created to demonstrate one of the unsupervised learning methods called K-Means Clustering. As an introduction, I will give a brief understanding of K-Means Clustering. A cluster is referred to a collection of data points aggregated together because of certain similarities. K value defines the number of centroids within the data points. This algorithm allocates all data points to the particular cluster. The condition here is, it tries to reduce the in-cluster sum of the square value. For that purpose, it will randomly select K number of centroids within the range of data points and repetitively optimize the centroid value until stabilizing the centroid or achieving the defined number of iterations.

In my application, the user can easily get an understanding of the K-Means Clustering algorithm when the home page loads as shown in the below image.

Home Page

Depending on the scenario the features in the data sets are varying. That is the beauty of this unsupervised learning method. Regarding the scenario, the user can check how these data points are distributed, how the clustering happens, and define the meaning for each cluster. That will help in understanding which cluster will be allocated for the new data entry. To give this experience to the user, in my application I allowed the user to upload their data set in CSV format. That will help the user to upload data sets in various kinds of scenarios.

Load Data (Before)
Load Data (After)

In this application, I filtered numerical columns which can be used to check correlation in between features. Also dropped the data values which are not assigned in the data set. In this stage I have only selected first 200 rows of data set for the clustering. Before going for a clustering process, now the user can see a summary of the data set to get an understanding of the available data and features. The “Show Data” button will do this for the user as shown in the below image.

Show Data

Once the data is ready for clustering the user can fill out the form in the application as the need. In my application, once the user clicks the “Clusters” button, it will load a form to select the K value and what features need to be in the X and Y axes. The specialty of this application is the dropdown values are changing accordingly to the uploaded data set. Once after filling out the form and selecting the “Run Clusters” button, it shows a plot of data points in different colors. These colors are representing the clusters identified by the K-Means clustering algorithm. Here I have used KMeans defined in the SKLearn library. I am happy to announce, now you can use H2OKMeansEstimator which is in the H2O estimators. This will help the user to identify how clusters are derived within the data set and take further decisions according to the observations.

Input Form
Cluster Output

You can check out the demo video of the application from the link.

Here I like to highlight the point about the H2O wave. I have used only H2O wave 0.20.0 to create this end-to-end application. Without the use of HTML or CSS, we can create attractive AI applications with the use of H2O Wave. The reason why I am explaining the use of H2O wave with my application is, that will give more understanding of how this can be used to express our data science concepts to others. Rather than explaining the K-Means clustering algorithm to the person in words, it is more worthy to provide this kind of application to work with it. It is very easy to build these applications and no need to spend much time giving a clear idea to the person. Within this application, I have used sidebar, footer card, header card, and wide article preview cards to organize the UI, and also form cards and plot cards to visualize data and contents.

Gradually with the version upgrades, the H2O wave adds new feature sets which are beneficial for both developers who are building applications and clients who are using it to get an understanding of the developers’ approach.

--

--