K-NN Classification in C++

Furkan Çağlayan
Hacettepe AI Club
Published in
3 min readJul 7, 2019

K-Nearest Neighbors classification is a simple algorithm based on distance functions. It takes a point as an input and finds the closest ‘K’ points in the dataset.

Creating a Dataset

In most cases you would use a classification method on actual features like SIFT features. But let’s create a random 2d dataset for the sake of simplicity. Create a class named ‘Dataset’. Dataset will take 2 inputs, ’n’ is the size of the dataset and ‘l’ is the number of labels.

Created numbers are between 0 and 1. Each number has a label assigned to it randomly.

Now that we have our randomized dataset, let’s visualize it to see the distribution of points.

How to improve it?

Turn Dataset class into a modular class with varying dimensions.

Creating KNN Class

Now let’s create a class to takes 3 inputs. A dataset, we already created that, a ‘K’ value and a 2d vector that we want to predict class for.

How to improve it?

Try more distance functions besides Euclidean. Manhattan and Cosine distances first comes to mind.

Implementation follows like this. For each of the points in the dataset find the distance between it and the input vector. Let’s see the output for the dataset above when the input is (0.6,0.6) and k is 3.(X is our input)

Class=2

Luckily for us, we hit 3 out of 3. We can confidently say that our class is blue or 2. But what about different values of k?

Choosing the ‘K’

K value is crucial in this classification. For example when you expect noise in your data, 1-nn is not the best option. So you may want to increase it. But when you have a sparse data, 1-nn or lower values of k might work just fine.

Also, you must have realized we never talk about event values of k. That’s because when you try to do a, let’s say a binary classification, you’d always want to choose odd values for k. Otherwise values of each class might be equal. Now let’s see how we call our Knn class and do some more experiments.

Now let’s call Knn class with k=5 and the input=(0.8,0.3) and see the result.

Class=0

What happens when we call the same function, but with k=7?

Class=1

Our prediction is changed. So get ready for that.

To sum up, Knn is simple and robust classification algorithm but sometimes it may not give the desired output.

--

--