Some Notes on DBSCAN Algorithm

Salil Jain
Analytics Vidhya
Published in
3 min readJul 22, 2020

--

Illustration of the DBSCAN cluster model (source: https://dl.acm.org/doi/pdf/10.1145/3068335)

In this post, I would like to discuss some of the insights on DBSCAN algorithm. Usually, when I look at an algorithm I try to see what each line of pseudocode does and why?

ALGORITHM 1: Pseudocode of Original Sequential DBSCAN Algorithm (source: https://dl.acm.org/doi/pdf/10.1145/3068335)

line 12: is q core or base point?

At any point, a point would have one of three types of labels: undefined, Noise, or a cluster label. I was wondering if q’s label is not undefined then it would be a cluster label. If it’s part of a cluster then either it is a core point or border point and it is already processed.

If it is a core point and as its already processed (a RangeQuery is already executed), p would have been part of the Neighbors returned from that. At that point, we can conclude that since p would have been already processed if q was a core point, q is not a core point.

If it is a border point and again as it is already processed, p would be returned as Neighbors of the same but not processed. That is because the number of neighbors returned would be less than minPts.

Definitely line 12 is needed as otherwise we would process q even though its already processed. Though this is a good example that a border point could be…

--

--