Week 3- Breast Cancer Detection

Yahya Koçak
bbm406f19
Published in
2 min readDec 21, 2019

Hello everyone!

This is our third blog about our Machine Learning Course Project on Breast Cancer Detection. This week we used the K Nearest Neighborhood(k-NN) classifier algorithm on our data set.

What is the K Nearest Neighborhood(k-NN) classifier algorithm?

Brief description: The k-nearest neighbor's algorithm (k-NN) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space.
In k-NN classification, the output is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small).

Ready to the data set:

We need to get the data ready before we can use the K Nearest Neighborhood(k-NN) classifier algorithm.

We convert the diagnosis value of M and B to a numerical value as follows:
M (Malignant) = 1
B (Benign) = 0

Apply the K Nearest Neighborhood(k-NN)

We split 30% of our data set into the test data set and 70% as the train data set.

We implemented the Knn algorithm. Then we run it and calculate k that gives the highest accuracy.

The optimal number of neighbors is 17

See you next week…

References

https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm

--

--