Week 3- Breast Cancer Detection
Hello everyone!
This is our third blog about our Machine Learning Course Project on Breast Cancer Detection. This week we used the K Nearest Neighborhood(k-NN) classifier algorithm on our data set.
What is the K Nearest Neighborhood(k-NN) classifier algorithm?
Brief description: The k-nearest neighbor's algorithm (k-NN) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space.
In k-NN classification, the output is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small).
Ready to the data set:
We need to get the data ready before we can use the K Nearest Neighborhood(k-NN) classifier algorithm.
We convert the diagnosis value of M and B to a numerical value as follows:
M (Malignant) = 1
B (Benign) = 0
Apply the K Nearest Neighborhood(k-NN)
We split 30% of our data set into the test data set and 70% as the train data set.
We implemented the Knn algorithm. Then we run it and calculate k that gives the highest accuracy.
The optimal number of neighbors is 17
See you next week…
References