DBSCAN Clustering

Balaji C
4 min readNov 11, 2023

Density-based spatial clustering of applications with noise(DBSCAN).

Unlike KMeans or Kmediods the desired number of clusters(K) is not given as input rather DBSCAN determine dense cluster from data points.

Main aim of DBSCAN is to create clusters with minimum size and density. Density is defined as minimum number of points within a certain distance of each other. It handles Outlier problem easily and efficiently because outliers are not dense and hence they can’t form clusters.

Concept of Min.points and ε(threshold value Eps).

In DBSCAN there are main internal concepts like Core Point, Noise Point, Border Point, Center Point, ε.

ε: It defines the neighborhood around a data point i,e distance between two points is lower or equal to ε then they are considering neighbors. If ε value is chosen too small then large part of data will be considered as outliers. If ε value is too large then the clusters will be merge and majority of data points will be in the same cluster. One way to find ε value is based on k-distance graph.

Min.points: Minimum number of neighbors (data points) with ε radius. Larger the dataset, the large value of Min.points must be chosen.

Core points: A point is said to core point if it has more than Min.points within ε.

Border point: A point which has fewer than Min.points within ε but its in the neighborhood of core point.

--

--