Different Types of Classification — Machine Learning Basics

Venkatesha Prasad S

Published in

Analytics Vidhya

4 min readMay 31, 2020

According to Andrew NG, “ Artificial Intelligence is the new electricity “

The growth in the fields of Artificial Intelligence and Machine Learning in the last two years are unbelievable.

In this fast moving world, the need of improvement in order to compete with the competitors is ever-lasting. The need of improvement in the product or service can’t be neglected. ML (Machine Learning) provides that improvement if used correctly and efficiently.

In this blog, I am going to explain about the Classification types and how to implement them using the scikitlearn library in Python

Types of Classification in ML :

Binary Classification
Multiclass Classification

BINARY CLASSIFICATION :

In Binary classification, the output/label consists of two classification like True or False, 0 or 1 , Object A or Object B etc. . The output can be predicted as any one of those but not both.

Binary Classification is used extensively in Weather Prediction , Medical Testing , Spam Detection , Credit Card Fraudulent Transaction detection etc. .

There are many paradigms that can be used to learn about Binary Classifier like:

Logistic regression
Decision Trees
Random forests
Neural Networks
Support Vector Machine
SGDClassifier

I am going to train a model using SVM (Support Vector Machine) Classifier to solve the Cancer Prediction problem

Support Vector machine is a supervised learning algorithm that is used for predicting the label based on the feature values.

Hyperplane seperating the two different classes — Source : Wikipedia

Implementing SVM using Python:

Importing the necessary libraries

import pandas as pd
import numpy as np
import sklearn
from sklearn import datasets

Getting the data

cancer = datasets.load_breast_cancer()

Understanding the dataset

dir(cancer)
cancer.feature_names
cancer.target
cancer.target_names

Splitting the data into Training and Testing set

from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(cancer.data,   cancer.target, test_size = 0.33 , random_state=0)print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

Importing the Classifier

from sklearn.svm import SVC

Training the data

model = SVC(kernel = 'linear')     #The parameter can be changed
model.fit(X_train, y_train)

To check the model efficiency , we will predict X_test and compare them with original values (y_test). The metric used for finding efficiency is accuracy.

ypred = model.predict(X_test)    #Predicting the valuesfrom sklearn import metricsprint('Accuracy:',metrics.accuracy_score(y_test,ypred))

The model can be fine-tuned by changing the Hyperparameters or by trying different models and so on. But, now we implemented a simple Binary Classification model.

2. MULTICLASS CLASSIFICATION :

Unlike Binary Classification, in Multiclass Classification has more than two labels/outputs. The feature belong to any one of the N different classes of the output.

This type of classification is used in Image recognition ( EX : Recognizing Barack Obama from a picture of Ex-American Presidents or Recognizing Lion from a group of animals in forest).

Algorithms that can be used for Multiclass Classification problems are:

K-Nearest Neighbors
Logistic regression
Decision Trees
Random Forest Classifier
Neural Network
Naive Bayes
SVM

In the Binary Classification , SVM was used. Here, Logistic regression is used to classifying the leaves using the iris dataset.

**Sigmoid ( σ ) Function** *( source : Wikipedia )*

Implementing Logistic Regression using Python:

Importing the necessary Libraries

import pandas as pd
import numpy as np
import sklearn
from sklearn import datasets

Reading the dataset

from sklearn.datasets import load_iris
iris=load_iris()

Understanding the data

type(iris)
dir(iris)
iris.data
iris.target_names
iris.feature_names

Splitting the dataset

from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=1)print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

Training the model

from sklearn.linear_model import LogisticRegressionmodel = LogisticRegression()
model.fit(X_train,y_train)

Estimating the model efficiency using the accuracy metric

ypred = model.predict(X_test)    #Predicting the valuesfrom sklearn import metricsprint('Accuracy:',metrics.accuracy_score(y_test,ypred))

Thus , a simple Logistic Regression model was implemented for the iris classification dataset and accuracy around 98% is obtained.

This blog is purely for a beginner who has just stepped into the field of Machine Learning. There is lot of math behind each of these things. I would recommend you to read various books for ML to learn more about it.To learn more about the math behind ML , I recommend you start at MIT’s course on Linear algebra. There are lot of good YouTube channels like 3Blue 1Brown, Data school, Code Basics etc .

This is my first blog. Feel free to reach out to me in the comments and feedback are very much appreciated !! Thank you for reading this article. I hope its helpful to you all ! If you enjoyed this article , please leave some claps to show your appreciation.

Different Types of Classification — Machine Learning Basics

Written by Venkatesha Prasad S