Machine Learning Crash Course: K-Nearest Neighbors (KNN)

Code Primer
3 min readJan 12, 2023

K-Nearest Neighbors (KNN) is a non-parametric, instance-based learning algorithm that can be used for both classification and regression problems. It is one of the simplest and most intuitive machine learning algorithms, and it’s also widely used in industry for a variety of applications.

The basic idea behind KNN is that an instance can be classified or predicted based on the majority of its neighbors’ class or value. The number of neighbors used is represented by the parameter k. The algorithm works by storing all available cases and classifying new cases based on a similarity measure (e.g., distance functions).

An example of when we might use KNN is in customer segmentation for a retail company. The company could gather data on customer demographics and purchase history, and use KNN to group similar customers together. This would allow the company to target specific groups of customers with tailored marketing campaigns, leading to increased sales and customer retention. KNN is a good choice for this problem because it’s able to identify patterns and group similar instances together based on their features, and it’s easy to interpret and use.

For classification, the k nearest neighbors of a new instance are found by calculating the distance between the new instance and all stored instances. The majority class among the k-nearest neighbors is then chosen as the predicted class for the new instance.

For regression, the k nearest neighbors of a new instance are found by calculating the distance between the new instance and all stored instances. The predicted value for the new instance is then computed as the average of the k-nearest neighbors’ values.

https://www.researchgate.net/figure/An-illustration-of-K-nearest-neighbor-model_fig6_321751429
An illustration of K nearest neighbor model.

We can use Python and the popular library scikit-learn to train a KNN model:

from sklearn.neighbors import KNeighborsClassifier

# Create an instance of the KNeighborsClassifier class
knn = KNeighborsClassifier(n_neighbors = k)

# Fit the model using the training data
knn.fit(X_train, y_train)

# Make predictions using the testing data
y_pred = knn.predict(X_test)

# evaluate the model
from sklearn.metrics import accuracy_score, confusion_matrix

print("Accuracy: ", accuracy_score(y_test, y_pred))
print("Confusion matrix: \n", confusion_matrix(y_test, y_pred))

The above code will create an instance of the KNeighborsClassifier class, fit the model using the training data, make predictions using the testing data and evaluate the model using the accuracy score and confusion matrix.

It’s important to note that the performance of the KNN algorithm can be affected by the choice of distance metric, the value of k and the scale of the data. The choice of k, in particular, is crucial as a small value of k may result in a model that is too sensitive to noise while a large value of k may result in a model that is too slow and insensitive to subtle patterns in the data.

In conclusion, KNN is a simple and powerful algorithm that can be used for both classification and regression problems. It’s a great starting point for your Machine Learning journey and it’s also widely used in industry for a variety of applications.

Next Part:

--

--

Code Primer

Welcome to Code Primer! Find easy-to-follow coding tutorials for beginners and experienced developers alike. Covering a variety of topics, we've got you covered