# [ Archived Post ] Spectral clustering

Please note that this post is for my own educational purpose.

Measure similarity and cluster the data points → compactness → more similar → gets compacted.

Connectivity → data that are a close → the same cluster → and connected.

Data → point → treated as a graph partition problem → rather than some Euclidean distance problem.

Very cool method → graph → low dimension and then → create clusters. Very closely related to graph theories and math behind graphs. (multiple methods of creating graph exist → such as KNN or e-neighborhood graph).

There is also a fully connected graph → does not seem so scaleable.

How projection happens → from graph to table like format.

The calculation can happen like above → there is also math for these methods.

Better and clean clustering → good to use second eigenvalues → very interesting.

However, as many different methods, → there are up and downside of these methods.

Cluster points using eigenvalues → , however, there are unresolved issues → such as different algorithms use eigenvalues different ways and no proof exists. (matrix perturbation theory). → also, achieve good results.

Clustering has been studied for such a long time → many existing methods → do heavy assumptions → also could find local optimal.

Spectral methods → a promising method of clustering → using eigenvectors. → very special method and interesting → but many researchers do not agree with one another.

This can be seen as relaxed NP-hard problems → and the second eigenvalue → gives a much better result.

Given a set of points → need to cluster them → create a distance matrix → eigenvalue decomposition → choose eigenvectors → cluster using K-means or any other algorithms. (but there is the mulitple solution to this problem)

Understand it in the ideal case → different cluster → data points are far apart → when the distance matrix is created → easier to separate the data point from one another. → so the eigenvalue decomposition comes out clean → and when doing k means algorithm to get each cluster → the final classification is a clean cut.

But in the general case, → the diag values of A is not zero → and it gets harder to separate. (spectral graph theory comes into play → to discuss how well each data points are connected → also can relate to mixing time of a random walk → sexy math). (many assumptions are made → to solve the general case → that relates to different areas of studies).

Seven clustering problems → were applied to test the algorithm → as expected → a much better job at clustering different data points. (but colors are not shown).

Very good results for a simple algorithm → wonder how this will play an effect on a larger scale data.

There is a connection between kernel PCA and spectral clustering → this is a very interesting area of study.

Reference

## Spectral clustering — Towards Data Science. (2019). Towards Data Science. Retrieved 8 February 2019, from https://towardsdatascience.com/spectral-clustering-82d3cff3d3b7

(2019). Papers.nips.cc. Retrieved 8 February 2019, from https://papers.nips.cc/paper/2092-on-spectral-clustering-analysis-and-an-algorithm.pdf