Origin of wine part 4

Nelson Punch
Software-Dev-Explore
2 min readNov 2, 2023
Photo by Guillermo Ferla on Unsplash

Introduction

It is important to have labels in order to train a machine learning model. The dataset I have in hand do not provide any labels. Thus I have no idea how good is the performance of the trained model.

A solution is come to my head is to use unsupervise learning to generate labels for the dataset. KMeans which provided from Scikit-Learn can do such job.

KMeans is capable of clustering data and give each clusters an unique index which can be the label for the dataset.

Code

Notebook with code

KMeans

It is straightforward to use KMeans to cluster the data.

Simply to specify the number of clusters KMeans is going to produce then fit the data. labels_ property from KMeans give index each samples that belong to.

I can inspect the number of samples for each cluster by cluster’s index(label).

Finally, I store labels back to the dataframe.

Conclusion

Unsupervise learning is able to help me to create labels when the labels do not come with the dataset.

KMeans is one of unsupervise learning model and easy to use to create labels for the dataset.

Next

Select an appropriate model for a machine learning problem is an important step before training a model. Model Selection technique can help me to achieve it.

part 5

--

--