Origin of wine part 4
Introduction
It is important to have labels in order to train a machine learning model. The dataset I have in hand do not provide any labels. Thus I have no idea how good is the performance of the trained model.
A solution is come to my head is to use unsupervise learning to generate labels for the dataset. KMeans which provided from Scikit-Learn can do such job.
KMeans is capable of clustering data and give each clusters an unique index which can be the label for the dataset.
Code
KMeans
It is straightforward to use KMeans to cluster the data.
Simply to specify the number of clusters KMeans is going to produce then fit the data. labels_ property from KMeans give index each samples that belong to.
I can inspect the number of samples for each cluster by cluster’s index(label).
Finally, I store labels back to the dataframe.
Conclusion
Unsupervise learning is able to help me to create labels when the labels do not come with the dataset.
KMeans is one of unsupervise learning model and easy to use to create labels for the dataset.
Next
Select an appropriate model for a machine learning problem is an important step before training a model. Model Selection technique can help me to achieve it.