Different Types of Classification — Machine Learning Basics

Venkatesha Prasad S
Analytics Vidhya
Published in
4 min readMay 31, 2020
Photo by : Chris Liverani

According to Andrew NG, “ Artificial Intelligence is the new electricity “

The growth in the fields of Artificial Intelligence and Machine Learning in the last two years are unbelievable.

In this fast moving world, the need of improvement in order to compete with the competitors is ever-lasting. The need of improvement in the product or service can’t be neglected. ML (Machine Learning) provides that improvement if used correctly and efficiently.

In this blog, I am going to explain about the Classification types and how to implement them using the scikitlearn library in Python

Types of Classification in ML :

  • Binary Classification
  • Multiclass Classification
  1. BINARY CLASSIFICATION :

In Binary classification, the output/label consists of two classification like True or False, 0 or 1 , Object A or Object B etc. . The output can be predicted as any one of those but not both.

Binary Classification is used extensively in Weather Prediction , Medical Testing , Spam Detection , Credit Card Fraudulent Transaction detection etc. .

There are many paradigms that can be used to learn about Binary Classifier like:

  • Logistic regression
  • Decision Trees
  • Random forests
  • Neural Networks
  • Support Vector Machine
  • SGDClassifier

I am going to train a model using SVM (Support Vector Machine) Classifier to solve the Cancer Prediction problem

Support Vector machine is a supervised learning algorithm that is used for predicting the label based on the feature values.

Hyperplane seperating the two different classes
Source : Wikipedia

Implementing SVM using Python:

  • Importing the necessary libraries
import pandas as pd
import numpy as np
import sklearn
from sklearn import datasets
  • Getting the data
cancer = datasets.load_breast_cancer()
  • Understanding the dataset
dir(cancer)
cancer.feature_names
cancer.target
cancer.target_names
  • Splitting the data into Training and Testing set
from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(cancer.data,   cancer.target, test_size = 0.33 , random_state=0)print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
  • Importing the Classifier
from sklearn.svm import SVC
  • Training the data
model = SVC(kernel = 'linear')     #The parameter can be changed
model.fit(X_train, y_train)
  • To check the model efficiency , we will predict X_test and compare them with original values (y_test). The metric used for finding efficiency is accuracy.
ypred = model.predict(X_test)    #Predicting the valuesfrom sklearn import metricsprint('Accuracy:',metrics.accuracy_score(y_test,ypred)) 

The model can be fine-tuned by changing the Hyperparameters or by trying different models and so on. But, now we implemented a simple Binary Classification model.

2. MULTICLASS CLASSIFICATION :

Unlike Binary Classification, in Multiclass Classification has more than two labels/outputs. The feature belong to any one of the N different classes of the output.

This type of classification is used in Image recognition ( EX : Recognizing Barack Obama from a picture of Ex-American Presidents or Recognizing Lion from a group of animals in forest).

Algorithms that can be used for Multiclass Classification problems are:

  • K-Nearest Neighbors
  • Logistic regression
  • Decision Trees
  • Random Forest Classifier
  • Neural Network
  • Naive Bayes
  • SVM

In the Binary Classification , SVM was used. Here, Logistic regression is used to classifying the leaves using the iris dataset.

Sigmoid ( σ ) Function ( source : Wikipedia )

Implementing Logistic Regression using Python:

  • Importing the necessary Libraries
import pandas as pd
import numpy as np
import sklearn
from sklearn import datasets
  • Reading the dataset
from sklearn.datasets import load_iris
iris=load_iris()
  • Understanding the data
type(iris)
dir(iris)
iris.data
iris.target_names
iris.feature_names
  • Splitting the dataset
from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=1)print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
  • Training the model
from sklearn.linear_model import LogisticRegressionmodel = LogisticRegression()
model.fit(X_train,y_train)
  • Estimating the model efficiency using the accuracy metric
ypred = model.predict(X_test)    #Predicting the valuesfrom sklearn import metricsprint('Accuracy:',metrics.accuracy_score(y_test,ypred))

Thus , a simple Logistic Regression model was implemented for the iris classification dataset and accuracy around 98% is obtained.

This blog is purely for a beginner who has just stepped into the field of Machine Learning. There is lot of math behind each of these things. I would recommend you to read various books for ML to learn more about it.To learn more about the math behind ML , I recommend you start at MIT’s course on Linear algebra. There are lot of good YouTube channels like 3Blue 1Brown, Data school, Code Basics etc .

This is my first blog. Feel free to reach out to me in the comments and feedback are very much appreciated !! Thank you for reading this article. I hope its helpful to you all ! If you enjoyed this article , please leave some claps to show your appreciation.

--

--