KMeans With Wine Qualities
Data and Collection
Overall this project seeks to cluster both red and white wine datasets based upon their alcohol percentage and fixed acidity. From a business perspective they could target a popular wine and by finding the cluster, other similar products could be advertised, in hopes of selling more to consumers. I found this dataset on kaggle which had an individual csv file for red and white wine, and the corresponding code can be found in a Jupyter Notebook. In this notebook I used the libraries of Pandas, Matplotlib, and Scikit-Learn’s standard scaler and kmeans within Python to reach these insights.
Data Cleaning
This dataset was previously cleaned and so the main thing I did was use Scikit-Learn’s standard scaler to scale all of the values within the dataset. This was done on the features that I selected within the dataset to be visualized.
Analysis
Before producing a visualization of the clusters, I needed to find the number of clusters, so I used the elbow method and for each of the datasets the optimal number of clusters was 5. At this used Scikit-Learn to create and fit the clusters on the data for the red and white wine individually, and then added a column into both datasets with the corresponding cluster id. From this point I was able to plot these clusters using matplotlib and made sure that each cluster was featured in a different color.
Limitations
Overall, I there are limitations in choosing these two attributes to get clusters from, as there were a number of attributes that could have been chosen. Clusters won’t necessarily taste the same or be similar products this just tells us based on the acidity and alcohol percentage. I also find the visuals to be quite condensed and maybe a sample should have been taken instead of visualizing all of the data.