Member-only story
Measures of Proximity in Data Mining & Machine Learning
to perform transformation of Data during Analysis
In one of my previous posts, I talked about Assessing the Quality of Data for Data Mining & Machine Learning Algorithms. This will continue on that, if you haven’t read it, read it here in order to have a proper grasp of the topics and concepts I am going to talk about in the article.
The term proximity between two objects is a function of the proximity between the corresponding attributes of the two objects. Proximity measures refer to the Measures of Similarity and Dissimilarity. Similarity and Dissimilarity are important because they are used by a number of data mining techniques, such as clustering, nearest neighbour classification, and anomaly detection.
We will start the discussion with high-level definitions and explore how they are related. Then, we move forward to talk about Proximity in two data objects with one simple attribute and moving to objects with multiple attributes.
Please bear with me for the conceptual part, I know it can be a bit boring but if you have strong fundamentals, then nothing can stop you…