good stuff!
Brian Kirkby

I would say it changes from problem to problem. Something I should have mentioned in the article is that I go through the “rinse and repeat” steps with a random sampling of the data. This gives me a feel for what works best before firing of the clustering algorithm on the entire data set to be clustered. Which would be infeasible to run repeatedly while trying to choose a methodology. Step 1–10 can take a day or even 3 weeks to a month.

