Neil Liberman
Jul 23, 2017 · 1 min read

Hi Jeru! A common way would be what’s called the elbow method. I’ll edit the article to reflect this in the future. But essentially what you would do is plot the number of clusters on the x axis and error on the y axis. (The error metric you use would be specific to your problem). The elbow is essentially where the error begins to flatten out and thus adding more clusters (increasing K) doesn’t add significantly more value. Keep in mind, the more clusters you add, the better your clustering will be from an error standpoint. However, adding more and more clusters will eventually diminish the value of your model. For example, if every datapoint was in it’s own cluster, there would be no error, but you also haven’t grouped anything, thus the model serves no purpose.

Hope this helps. I’ll try to edit the article for a slightly more comprehensive explanation.

    Neil Liberman

    Written by