Profiting from Python & Machine Learning in the Financial Markets
Gaëtan Rickter
3666

While PCA and K-Means may be the “standard” algorithms for this sort of analysis, they are far from the best algorithms. If you want dimension reduction that can expose cluster structure then something like t-SNE or LargeVis will be far better. Similarly if you actually want to find interesting groupings of data then HDBSCAN* will be far better than K-Means (for a start you don’t have to specify the number of clusters at the outset; it will also handle noise, and arbitrarily shaped clusters). I would love to see this same analysis but with t-SNE and HDBSCAN* swapped in for PCA and K-Means (t-SNE is in scikit-learn, and hdbscan is in scikit-learn-contrib, so it really is a simple drop in change for the code).

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.