Profiting from The Periodic Table of Elements & Machine Learning
Gaëtan Rickter
5286

While PCA and K-Means may be the “standard” algorithms for this sort of analysis, they are far from the best algorithms. If you want dimension reduction that can expose cluster structure then something like t-SNE or LargeVis will be far better. Similarly if you actually want to find interesting groupings of data then HDBSCAN* will be far better than K-Means (for a start you don’t have to specify the number of clusters at the outset; it will also handle noise, and arbitrarily shaped clusters). I would love to see this same analysis but with t-SNE and HDBSCAN* swapped in for PCA and K-Means (t-SNE is in scikit-learn, and hdbscan is in scikit-learn-contrib, so it really is a simple drop in change for the code).