7 types of Multi*-Classification
Inside AI
Full guide to knn, logistic, support vector machine, kernel svm, naive bayes, decision tree classification, random forest, Deep Learning and even with Grid Search Multi-Classification.
As usual,
Hi! How are you doing? I hope it's great……
Today let's understand and perform all types of classification for Multi-Class/ Multi-Label target variable.
Let’s get started, we will use a dataset that has 7 types/categories of glass. The dataset is available at UCI https://archive.ics.uci.edu/ml/datasets/Glass+Identification
Number of Attributes: 10 (including an Id#) plus the class attribute
— all attributes are continuously valued
Attribute Information:
1. Id number: 1 to 214
2. RI: refractive index
3. Na: Sodium (unit measurement: weight percent in corresponding oxide, as are attributes 4-10)
4. Mg: Magnesium
5. Al: Aluminum
6. Si: Silicon
7. K: Potassium
8. Ca: Calcium
9. Ba: Barium
10. Fe: Iron
11. Type of glass: (class attribute)
-- 1 building_windows_float_processed
-- 2 building_windows_non_float_processed
-- 3 vehicle_windows_float_processed
-- 4 vehicle_windows_non_float_processed (none in this database)
-- 5 containers
-- 6 tableware
-- 7 headlamps
Let’s get started with our commonly used Classification method:
1.) Logistic Regression then we will use
2.) Knn
3.) Support Vector Machine
4.) Kernel SVM
5.) Naive Bayes
6.) Decision Tree Classification
7.) Random Forest Classification
Any else Classification? Let me know in the comment below.
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd# Importing the dataset
dataset = pd.read_csv('glass.data', header = None,index_col = 0)#drop the missing values
dataset = dataset.dropna()
# Statistical summary of the variables
dataset.describe()X = dataset.iloc[:,0:9].values
y = dataset.iloc[:, 9].values
Till here it’s the same as before, load the data then split the data into X and Y where Y is the dependent/target variable 9th column (glass categories) and rest from 0 to 9 are independent variables X
Note: in python index position of the columns start from 0 and not from 1.
Then before we will split the data into train & test datasets, we need to check for any categorical imbalance. If one of the categories is way too less than the rest, it's better to remove the imbalanced category as it means it doesn’t have enough data to learn the cause-effect relationship. In general, I make sure it should least have 5–10% of the total categories.
After this step, we will transform all the columns(dependent variables) into one standard value/range that will reduce the spread, magnitude of the data points without losing the original meaning of the data.
It helps the algorithm to compute the data faster and efficiently.
#Check for class imbalance
dataset.groupby(y).size()#Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)#Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
Now it's time to fit the data with logistic regression and predict with test results.
#Fitting Logistic Regression to the Training set
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)#Predicting the Test set results
y_pred = classifier.predict(X_test)
DONE… !!! super easy isn’t it ?
Let’s compare the predicted results with our original dataset.
#We can also compare the actual versus predicted
df = pd.DataFrame({'Actual': y_test.flatten(), 'Predicted': y_pred.flatten()})df
Fatten it helps to represent data in a 1-dimensional array like a list.
What we can understand from this confusion matrix, that 11 data points of class 0 are actually class 0 and detected class 0, 3 data points which is actually class 0 but detected class 1. The same goes for the 2nd row 10 data points are actually class 1 but detected class 0 and 12 data points are actually class 1 and detected correctly class 1 and the list goes on.
Alright, we have another metric to evaluate the model performance is by using metrics.accuracy_score
#evaluation Metrics
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred))Accuracy Score: 0.5925925925925926
Balanced Accuracy Score: 0.5476190476190476
Well, our classifier didn’t work well, no worries! We will try another powerful classifier and see if it improves, but before that let me put all the pieces together in case if you wish to use it as a template.
#Logistic Regression#Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd#Importing the dataset
dataset = pd.read_csv('glass.data', header = None,index_col = 0)#drop the missing values
dataset = dataset.dropna()
# Statistical summary of the variables
dataset.describe()X = dataset.iloc[:,0:9].values
y = dataset.iloc[:, 9].values#Check for class imbalance
dataset.groupby(y).size()#Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)#Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)#Fitting Logistic Regression to the Training set
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)#Predicting the Test set results
y_pred = classifier.predict(X_test)#We can also compare the actual versus predicted
df = pd.DataFrame({'Actual': y_test.flatten(), 'Predicted': y_pred.flatten()})
df#Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)#evaluation Metrics
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred))#Predict on unseen data
classifier.predict([[22,33,44,11,55,11,44,55]])
Next is KNN.
What is KNN?
K-Nearest Neighbors (KNN) is one of the simplest algorithms used in Machine Learning for regression and classification. KNN algorithms classify new data points based on similarity measures (e.g. Euclidean distance function).
Classification is done by a majority vote to its neighbors (K).
Let’s get started on how to apply KNN for Multi-Classification problems.
#K-Nearest Neighbors (K-NN)#Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd#Importing the dataset
dataset = pd.read_csv('glass.data', header = None,index_col = 0)#drop the missing values
dataset = dataset.dropna()
#Statistical summary of the variables
dataset.describe()X = dataset.iloc[:,0:9].values
y = dataset.iloc[:, 9].values#Check for class imbalance
dataset.groupby(y).size()
#------------------------------------------------------------
#Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
Till here it’s the same as before, load the data, define X and Y, split the data, and then scale the independent variables
NOW we will fit the KNN to our training data set where K nearest neighbors K =9 , metric = minkowski which helps to measure three-dimensional Euclidean space and p = 2 is the Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2
# Fitting K-NN to the Training set
from sklearn.neighbors import KNeighborsClassifier
knn_model = KNeighborsClassifier(n_neighbors = 9, metric = 'minkowski', p = 2)
knn_model.fit(X_train, y_train)
That’s it!
Let’s check the model accuracy
# Predicting the Test set results
y_pred = knn_model.predict(X_test)#Model Evaluation------------------------------------------------------------
#We can also compare the actual versus predicted
df = pd.DataFrame({'Actual': y_test.flatten(), 'Predicted': y_pred.flatten()})
df# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)#evaluation Metrics
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred))Accuracy Score: 0.5925925925925926
Balanced Accuracy Score: 0.3948412698412698
Not much!!! Well, we definitely learn how to apply knn for the Multi-Classification problem.
Here are all the pieces of KNN together
# K-Nearest Neighbors (K-NN)#Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd#Importing the dataset
dataset = pd.read_csv('glass.data', header = None,index_col = 0)#drop the missing values
dataset = dataset.dropna()
# Statistical summary of the variables
dataset.describe()X = dataset.iloc[:,0:9].values
y = dataset.iloc[:, 9].values#Check for class imbalance
dataset.groupby(y).size()
#-------------------------------------------------------------------
#Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)#Fitting K-NN to the Training set
from sklearn.neighbors import KNeighborsClassifier
knn_model = KNeighborsClassifier(n_neighbors = 9, metric = 'minkowski', p = 2)
knn_model.fit(X_train, y_train)#Predicting the Test set results
y_pred = knn_model.predict(X_test)#Model Evaluation--------------------------------------------
#We can also compare the actual versus predicted
df = pd.DataFrame({'Actual': y_test.flatten(), 'Predicted': y_pred.flatten()})
df#Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)#evaluation Metrics
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred))
#--------------------------------------#Predict on unseen data
knn_model.predict([[22,33,44,11,55,11,44,55]])
Next is SVM an another powerful classifier
SUPPORT VECTOR MACHINE
What is SVM?
SVM is a supervised machine learning algorithm that can be used for classification or regression problems
In brief, the principle working of SVM is to find the nearest data point(either class) with the help of a hyper-plane. This distance is called as Margin
SVM is highly preferred by many as it produces significant accuracy with less computation power.
Lets get understand this with the help of an example.
#Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd#Importing the dataset
dataset = pd.read_csv('glass.data', header = None,index_col = 0)#drop the missing values
dataset = dataset.dropna()#Statistical summary of the variables
dataset.describe()X = dataset.iloc[:,0:9].values
y = dataset.iloc[:, 9].values#Check for class imbalance
dataset.groupby(y).size()#----------------------------------------------------------#Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
Well till here it’s the same as others. First, we import the data that defined X & Y, Split the data into train and test sets, scale the independent variables to reduce the magnitude of the spread of data points without losing their original meaning.
It's time to fit the SVM into the training set.
#Fitting SVM to the Training setfrom sklearn.svm import SVC
svm_model = SVC(kernel = 'linear', random_state = 0)
svm_model.fit(X_train, y_train)#Predicting the Test set results
y_pred = svm_model.predict(X_test)#We can also compare the actual versus predicted
df = pd.DataFrame({'Actual': y_test.flatten(), 'Predicted': y_pred.flatten()})df#Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)#evaluation Metrics
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred))Output
Accuracy Score: 0.6111111111111112
Balanced Accuracy Score: 0.5496031746031745
Okay, we did improve a bit. I believe the data is way too non-linearly separately. Let’s try another advanced version of SVM called Kernel SVM
What is Kernel SVM
The complexity of Linear svm grows with the size of the dataset. In simple words Kernel SVM ‘rbf’ transforms complex non-linear data to higher dimensional 3D space to separate the data classes.
Usually linear and polynomial kernels are less time-consuming and provide less accuracy than the rbf or Gaussian kernels.
So, the rule of thumb is: use linear SVMs (or logistic regression) for linear problems, and nonlinear kernels such as the Radial Basis Function kernel for non-linear problems.
Lets. Compare Linear svm with kernel Radial based svm
# Importing the dataset
dataset = pd.read_csv('glass.data', header = None,index_col = 0)#drop the missing values
dataset = dataset.dropna()
# Statistical summary of the variables
dataset.describe()X = dataset.iloc[:,0:9].values
y = dataset.iloc[:, 9].values# Check for class imbalance
dataset.groupby(y).size()#-----------------------------------------------------------------
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)#fearure scale
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
Well till here it’s the same things everywhere. Load the data then define X and Y, split the data, and transform to the standard range to reduce the magnitude of data without losing its original meaning.
Now we will fit the data in both Linear as well as Kernel ‘rbf’ SVM to compare both of them.
#Fitting SVM to the Training set
from sklearn.svm import SVC
svm_model = SVC(kernel = 'linear', random_state = 0)
svm_model.fit(X_train, y_train)#Predicting the Test set results
y_pred = svm_model.predict(X_test)#Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)#evaluation Metrics
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred))
#------------------------------------------------------------------
#Fitting Kernal SVM to the Training set
from sklearn.svm import SVC
Ksvm_model = SVC(kernel = 'rbf', random_state = 0)
Ksvm_model.fit(X_train, y_train)#Predicting the Test set results
y_pred = Ksvm_model.predict(X_test)#Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm1 = confusion_matrix(y_test, y_pred)#evaluation Metrics
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred))
The confusion matrix of Kernel SVM is performing better in identifying True Positive and True Negative than Linear SVM
The accuracy score of our Kernel svm model is better than linear svm
Hence Kernel SVM performs better than Linear SVM.
Well, that’s not enough, we have a more powerful classifier.
Let me put all the codes together for Kernel SVM
#Kernal SVM
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd# Importing the dataset
dataset = pd.read_csv('glass.data', header = None,index_col = 0)#drop the missing values
dataset = dataset.dropna()
# Statistical summary of the variables
dataset.describe()X = dataset.iloc[:,0:9].values
y = dataset.iloc[:, 9].values# Check for class imbalance
dataset.groupby(y).size()#-----------------------------------------------------
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)#feature scale
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)#-----------------------------------------------------------------
# Fitting SVM to the Training set
from sklearn.svm import SVC
svm_model = SVC(kernel = 'linear', random_state = 0)
svm_model.fit(X_train, y_train)# Predicting the Test set results
y_pred = svm_model.predict(X_test)# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)#evaluation Metrics
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred))#-----------------------------------------------------------------
# Fitting Kernal SVM to the Training set
from sklearn.svm import SVC
Ksvm_model = SVC(kernel = 'rbf', random_state = 0)
Ksvm_model.fit(X_train, y_train)# Predicting the Test set results
y_pred = Ksvm_model.predict(X_test)# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm1 = confusion_matrix(y_test, y_pred)#evaluation Metrics
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred))
Next similar to SVM we have Naïve Bayes
What is Naive Bayes in short?
Naïve Bayes classifiers are a family of simple “probabilistic classifiers” based on applying Bayes’ theorem.
P(c|x) is the posterior probability of class (target) given predictor (attribute).
- P(c) is the prior probability of class.
- P(x|c) is the likelihood which is the probability of predictor given class.
- P(x) is the prior probability of predictor.
Likelihood: How probable is the evidence given that our hypothesis is true.
Prior: How probable was our hypothesis before observing the evidence?
Posterior: How probable is our hypothesis given the observed evidence?
Marginal: How probable is the new evidence under all possible hypotheses?
It's a long chapter about how Naive Bayes works. if are you interested to go in-depth further you can visit my other site. However
In short Naive Bayes uses class of probability method to classify the problem solution.
Let’s see how can we apply Naïve Bayes in Multi-Classification
#Naive Bayesimport numpy as np
import matplotlib.pyplot as plt
import pandas as pd# Importing the dataset
dataset = pd.read_csv('glass.data', header = None,index_col = 0)#drop the missing values
dataset = dataset.dropna()
# Statistical summary of the variables
dataset.describe()X = dataset.iloc[:,0:9].values
y = dataset.iloc[:, 9].values# Check for class imbalance
dataset.groupby(y).size()#---------------------------------------------
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)#fearure scaling/Normalization
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)# Fitting Naive Bayes to the Training set
from sklearn.naive_bayes import GaussianNB
NB_model = GaussianNB()
NB_model.fit(X_train, y_train)# Predicting the Test set results
y_pred2 = NB_model.predict(X_test)# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm2 = confusion_matrix(y_test, y_pred2)#evaluation Metrics
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred2))
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred2))
Naïve Bayes didn’t perform well for this data. And also it makes sense Naïve Bayes is usually good for textual data.
Well, we have 2 more powerful algorithms to go.
Let me put all the Naive Bayes codes together
#Naive Bayesimport numpy as np
import matplotlib.pyplot as plt
import pandas as pd# Importing the dataset
dataset = pd.read_csv('glass.data', header = None,index_col = 0)#drop the missing values
dataset = dataset.dropna()
# Statistical summary of the variables
dataset.describe()X = dataset.iloc[:,0:9].values
y = dataset.iloc[:, 9].values#Check for class imbalance
dataset.groupby(y).size()#-----------------------------------------------------
#Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)#fearure scaling/Normalization
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)#Fitting Naive Bayes to the Training set
from sklearn.naive_bayes import GaussianNB
NB_model = GaussianNB()
NB_model.fit(X_train, y_train)#Predicting the Test set results
y_pred2 = NB_model.predict(X_test)#Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm2 = confusion_matrix(y_test, y_pred2)#evaluation Metrics
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred2))
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred2))
Next is Decision Trees / Rule-based Classifier.
What are Decision Trees?
Decision Trees are a non-parametric supervised learning method used for both classification and regression tasks. The goal is to create a model that predicts the value of a target variable by learning simple decision rules derived from the data features.
The decision rules are generally in form of if-then-else statements. The deeper the tree, the more complex the rules and fitter the model.
A decision tree gives output in a tree-like graph with nodes. Take this graph as an example, beautifully explained.
Let’s get hands-on experience on how to perform Decision trees.
#Decision Tree Classification
#Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd#Importing the dataset
dataset = pd.read_csv('glass.data', header = None,index_col = 0)#drop the missing values
dataset = dataset.dropna()#Statistical summary of the variables
dataset.describe()X = dataset.iloc[:,0:9].values
y = dataset.iloc[:, 9].values#Check for class imbalance
dataset.groupby(y).size()#----------------------------------------------------#Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)#fearure scaling/Normalization
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)#Fitting Decision Tree Classification to the Training set
from sklearn.tree import DecisionTreeClassifier
dt_model = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
dt_model.fit(X_train, y_train)#Predicting the Test set results
y_pred = dt_model.predict(X_test)#Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)#evaluation Metrics
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred))
So what we got
Confusion matrix|Decision Tree is performing better in identifying True Positives than Naïve Bayes.
The accuracy score of our Decision Tree model is better than Naïve Bayes
Hence Decision Tree is performing better for this non-linearly separable data.
Wait since decision trees are rule-based classifiers and we can generate rules, let’s visualize and see what we go.
from sklearn.tree import export_graphviz
#export the decision tree to a tree.dot file
#for visualizing the plot easily anywhere
export_graphviz(dt_model, out_file ='e:/multi-class_tree.dot',
feature_names =['RI','Na','Mg','Ai','Si','K','Ca','Ba','Fe'])
The tree is finally exported and we can visualize using http://www.webgraphviz.com/ by copying the data from the ‘multi-class_tree.dot’ file.
Well, it’s a long list of trees, very difficult to put everything out here. So our classification did perform very well from the previous classifier.
Next, we have another classifier call Random Forest an upgrade version of Decision Tree Classifier
Let me put all of the Decision Tree Classifier codes together.
#Decision Tree Classification
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd# Importing the dataset
dataset = pd.read_csv('glass.data', header = None,index_col = 0)#drop the missing values
dataset = dataset.dropna()
# Statistical summary of the variables
dataset.describe()X = dataset.iloc[:,0:9].values
y = dataset.iloc[:, 9].values# Check for class imbalance
dataset.groupby(y).size()#------------------------------------------------------
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)#feature scaling/Normalization
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)# Fitting Decision Tree Classification to the Training set
from sklearn.tree import DecisionTreeClassifier
dt_model = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
dt_model.fit(X_train, y_train)# Predicting the Test set results
y_pred = dt_model.predict(X_test)# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)#evaluation Metrics
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred))#if we wish to enter manually
dt_model.predict([[22,33,44,11,55,11,44,55]])#----------------------------------------------
from sklearn.tree import export_graphviz
# export the decision tree to a tree.dot file
# for visualizing the plot easily anywhere
export_graphviz(dt_model, out_file ='e:/multi-class_tree.dot',
feature_names =['RI','Na','Mg','Ai','Si','K','Ca','Ba','Fe'])"""
The tree is finally exported and we can visualized using
http://www.webgraphviz.com/ by copying the data from the ‘tree.dot’ file."""
Next RANDOM FOREST
What is a random forest?
Random Forest is the upgrade version of decision trees. The name itself refers it consists of a large number of individual decision trees that operate as an ensemble. Thus we are combining the predictive power of several decision trees to give more accuracy.
Let’s get started with the help of an example
#Random Forest Classificationimport numpy as np
import matplotlib.pyplot as plt
import pandas as pd# Importing the dataset
dataset = pd.read_csv('glass.data', header = None,index_col = 0)#drop the missing values
dataset = dataset.dropna()
# Statistical summary of the variables
dataset.describe()X = dataset.iloc[:,0:9].values
y = dataset.iloc[:, 9].values# Check for class imbalance
dataset.groupby(y).size()#---------------------------------------------------------
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)#feature scaling/Normalization
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
Till here it’s the same basic data pre-processing step from loading the data, defining X & Y, splitting the data into train, and test to data normalization/scaling to reduce the magnitude of the spread of data points.
Now we will fit the random forest into the dataset. Also, we will do for decision tree so that later we can compare the performance.
#Fitting Decision Tree Classification to the Training set
from sklearn.tree import DecisionTreeClassifier
dt_model = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
dt_model.fit(X_train, y_train)#Predicting the Test set results
y_pred = dt_model.predict(X_test)#Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
#evaluation Metrics
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred))
#------------------------------------#Fitting Random Forest Classification to the Training set
from sklearn.ensemble import RandomForestClassifier
rf_model = RandomForestClassifier(n_estimators = 800, criterion = 'entropy', random_state = 0)
rf_model.fit(X_train, y_train)#Predicting the Test set results
y_pred2 = rf_model.predict(X_test)# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm2 = confusion_matrix(y_test, y_pred2)#evaluation Metrics
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred2))
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred2))
Well, we did increase the Balanced Accuracy by 4% with the default random forest settings.
Now to know the best settings it is not possible to try each and every setting one by one, it’s a tedious and time-consuming process and not very productive. For that, we have an automated process to find the best optimal settings for each classifier is called GRID SEARCH. But before we move ahead with Grid Search let me put all the pieces of Random Forest together so that later you can use it as a template.
#Random Forest Classificationimport numpy as np
import matplotlib.pyplot as plt
import pandas as pd# Importing the dataset
dataset = pd.read_csv('glass.data', header = None,index_col = 0)#drop the missing values
dataset = dataset.dropna()
# Statistical summary of the variables
dataset.describe()X = dataset.iloc[:,0:9].values
y = dataset.iloc[:, 9].values# Check for class imbalance
dataset.groupby(y).size()#-----------------------------------------------------------------------------------------------
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)#feature scaling/Normalization
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)#-------------------------------------------------------------------------------------------------# Fitting Decision Tree Classification to the Training set
from sklearn.tree import DecisionTreeClassifier
dt_model = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
dt_model.fit(X_train, y_train)# Predicting the Test set results
y_pred = dt_model.predict(X_test)# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
#evaluation Metrics
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred))#---------------------------------------------------------# Fitting Random Forest Classification to the Training set
from sklearn.ensemble import RandomForestClassifier
rf_model = RandomForestClassifier(n_estimators = 700, criterion = 'entropy', random_state = 0)
rf_model.fit(X_train, y_train)# Predicting the Test set results
y_pred2 = rf_model.predict(X_test)# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm2 = confusion_matrix(y_test, y_pred2)#evaluation Metrics
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred2))
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred2))
Next Grid Search
Grid Search
Finding the best parameters by manual tuning is a tedious process and time-consuming as it contains so many parameters to be tested over and over again. Well, it’s time-consuming and not productive. So to overcome this issue we will look into a method ‘GRID SEARCH’ to automate the task of finding the best model parameters for us.
We will divide this into 2 sections: a) Grid Search for finding the best hyperparameters for our machine learning model b.) Grid Search for Deep Learning models.
Let’s start with a) Grid Search for machine learning models
#Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd#Importing the dataset
dataset = pd.read_csv('glass.data', header = None,index_col = 0)#drop the missing values
dataset = dataset.dropna()#Statistical summary of the variables
dataset.describe()X = dataset.iloc[:,0:9].values
y = dataset.iloc[:, 9].values
Then we will split the data into train and test, scale our data before we fit our model. For this example, we will use Random Forest Classifier (RF) which now has the highest accuracy score with default parameters.
To automate the search of the best parameters of our Random Forest Model.
# Fitting Random Forest Classification to the Training setfrom sklearn.ensemble import RandomForestClassifier
rf_model = RandomForestClassifier(n_estimators = 500, criterion = 'entropy', random_state = 1)
rf_model.fit(X_train, y_train)#Applying Grid Search to find the best model and the best parameters
from sklearn.model_selection import GridSearchCV
parameters = [{ 'n_estimators': [500,1000,1500,2000]}]grid_search = GridSearchCV(estimator = rf_model,
param_grid = parameters,scoring = 'accuracy',cv = 7)
grid_search = grid_search.fit(X_train, y_train)best_accuracy = grid_search.best_score_
best_parameters = grid_search.best_params_
And what we have…….. Best Parameters for n_estimators: 1500
Okay, let’s try out with n_estimators and see if it improves our model.
#Lets retry our model with the new paramters
# Fitting Random Forest Classification to the Training set
from sklearn.ensemble import RandomForestClassifier
rf_model = RandomForestClassifier(n_estimators = 1700, criterion = 'entropy', random_state = 1)
rf_model.fit(X_train, y_train)# Predicting the Test set results
y_pred2 = rf_model.predict(X_test)# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm2 = confusion_matrix(y_test, y_pred2)#evaluation Metrics
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred2))
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred2))
Nice! It did improve from 0.70 to 0.72. You can use these codes as a template with a few modifications like the list of parameters for different types of classifiers and to know the parameters you can simply select the classifier name ‘svm’ + press ‘ctrl’ + ‘i’
Let’s me put all of the pieces together.
#Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd#Importing the dataset
dataset = pd.read_csv('glass.data', header = None,index_col = 0)#drop the missing values
dataset = dataset.dropna()#Statistical summary of the variables
dataset.describe()X = dataset.iloc[:,0:9].values
y = dataset.iloc[:, 9].values#Check for class imbalance
dataset.groupby(y).size()#-----------------------------------------------------------#Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)#feature scale
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)#Applying Grid Search to find the best model and the best parametersfrom sklearn.model_selection import GridSearchCV
parameters = [{ 'n_estimators': [500,1000,1500,2000]}]grid_search = GridSearchCV(estimator = rf_model,param_grid = parameters,scoring = 'accuracy',cv = 7)grid_search = grid_search.fit(X_train, y_train)best_accuracy = grid_search.best_score_
best_parameters = grid_search.best_params_############################################################Lets retry our model with the new parameters
#Fitting Random Forest Classification to the Training set
from sklearn.ensemble import RandomForestClassifier
rf_model = RandomForestClassifier(n_estimators = 1700, criterion = 'entropy', random_state = 1)
rf_model.fit(X_train, y_train)#Predicting the Test set results
y_pred2 = rf_model.predict(X_test)#Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm2 = confusion_matrix(y_test, y_pred2)#evaluation Metrics
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred2))
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred2))
Still not that satisfied with the accuracy level, isn’t it? have you tried the next-generation machine learning technique? Deep Learning.
So what are we waiting for, let’s try it?
I believe you are already aware of how Neural Networks work if not…don’t worry,, there are plenty of resources available on the web to get started with. However, I will too walk you through in brief what is neuron networks and how it learns?
In this diagram/photo, Dendrites are the receivers of the neuron while Axom is the transmitter of neuron signal.
What is a neuron?
In Artificial Intelligence Neuron is a mathematical function that models the functioning of a biological neuron. Typically, a neuron computes the weighted average of its input, and this sum is passed through a nonlinear function, also called as activation function, such as the sigmoid, Relu
Now if we put this in a flow diagram it will look something like this
In real off-course we gonna have larger and more complex neuronal networks.
How does it learn?
When they go process data back and forth (also known as backpropagation). They create weights to save the optimized parameter settings over n over again that gives less error/loss inaccuracy. Once it reaches the point where further calculation doesn’t give any improvement over preceding accuracy, the parameter settings are saved as weights. Now there are different types of methods to minimize the loss inaccuracy. One of them is the Gradient Descent.
Gradient Descent is an optimized algorithm often used for finding weights.
Types of Gradient Descent
1. Batch Gradient Descent: it calculates the error for each example in the training dataset but only updates the model after all training examples have been evaluated. In other words, it takes the whole data and adjusts weights with iterations & iterations.
Pros:
a) Fewer updates to the model means this variant of gradient descent is more computationally efficient than stochastic gradient descent.
b) And with the decreased update frequency results in a more stable error gradient and that may result in more stable convergence.
Cons:
a.) However stable error may result in premature convergence of the model to a less optimal set of parameters.
b.) It is implemented in such a way that it requires the entire training set in memory and is available to the algorithm. Thus with respect to training speed, may become slow for a large dataset.
2. Stochastic Gradient Descent calculates the error and updates the model for each example in the training dataset.
In other words: one row at a time, adjust the weights with iterations. Helps to avoid the local minimum rather than the global minimum and it's faster.
Pros:
a.) This variant is simpler to understand and implement for beginners
b.) The frequent updates immediately give an insight into the performance of the model and the rate of improvement.
c.) The increased model update frequency one row at a time can result in faster learning on some problems.
Cons:
a.) However updating the model so frequently is computationally expensive than others variants of gradient descent, especially train models on a large dataset.
b.) But the frequent updates can result in a noisy gradient signal which may cause the model parameters and in turn the model error to jump around.
3. Mini-Batch Gradient Descent: is a variation of the gradient descent algorithm that splits the training set into small batches that are used to calculate model error and update model co-efficient.
Mini-batch gradient descent seeks to find a balance between the robustness of stochastic gradient descent and the efficiency of batch gradient descent.
Pros:
a.) The model update frequency is higher than batch gradient descent which allows for a more robust convergence and avoiding local minima.
b.) The batch updates provide a computationally more efficient process than stochastic gradient descent.
c.) The batching allows both the efficiency of not having all the training data in memory and algorithm implementation.
Cons:
a.) Mini-batch requires the configuration of an additional ‘mini-batch’ size hyperparameter for the learning algorithm.
b.) Error information must be accumulated across mini-batches of training examples like batch gradient descent Thus requiring high computational power.
THE MOST COMMONLY USED OPTIMIZER IN DEEP LEARNING is ADAM, an another optimized algorithm.
NOW Since we have an idea of how Neural networks work. Let’s get started with a real-life example.
First, we will import the data and the libraries as go.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd#Importing the dataset
dataset = pd.read_csv('glass.data', header = None,index_col = 0)#drop the missing values
dataset = dataset.dropna()#Statistical summary of the variables
dataset.describe()X = dataset.iloc[:,0:9].values
y = dataset.iloc[:, 9].values#Check for class imbalance
dataset.groupby(y).size()
Then as usual define what is X and what is Y. I have also added groupby(y).size() to check any imbalance classes.
Now the interesting and the most important part to performing multi-classification in deep learning is to encode the target variable (y) that converts each category into a dummy variable to classify each category ……done that’s it!
#encode class values as integers
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
encoder.fit(y)
encoded_Y = encoder.transform(y)from keras.utils import np_utils
#convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(y)
We will split the data into train and test set as usual and one more simple and super fast step we have to do is split the dataset into training and test dataset for the ANN to learn and test then we have to do Feature scaling to bring the magnitude into a small range that will help to reduce the workload in ANN without compromising the original meaning of the data.
Thus scaling doesn’t add any noise neither loses the original meaning of the data.
#fix random seed for reproducibility
seed = 7#Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, dummy_y, test_size=0.33, random_state=seed)#fearure scaling/Normalization
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# AND WE ARE DONE WITH THE DATA PREPARATION !!!!!!!!!
#LET’s START THE FUN PART- CREATING A NEURAL NETWORK!!!
#Create ANN
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifierfrom sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.pipeline import Pipeline
#a small note on Keras and TensorFlow BUZZ word that we hear all the time.
TensorFlow is an end-to-end open-source platform. It’s a comprehensive and flexible ecosystem of tools, libraries, and other resources that provide workflows with high-level APIs.
Keras, on the other hand, is a high-level neural networks library that is running on top of TensorFlow, CNTK, and Theano. Using Keras in deep learning allows developers to easily build neural networks without worrying much about the mathematical aspects of tensor algebra, numerical techniques, and optimization methods. Keras was developed with the objective of allowing people to write their own scripts without having to learn the backend in detail.
Let’s Get Back To The Track!
#define baseline model
def baseline_model():#create model
model = Sequential()
model.add(Dense(30, input_dim=8, activation='relu'))
model.add(Dense(8, activation='softmax'))#Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])return modelestimator.fit(X_train,Y_train)
estimator.predict(X_test)
predictions = estimator.predict(X_test)
print(predictions)
# we will add and connect layers using .add and DENSE with units = 30 hmmm..! what does that 30 means?
30 refers to a number of nodes/neurons in the layer, usually, we choose half of the number of columns(variables) we have in our dataset.
Next, we have kernel _initializer = ‘uniform’ where uniform is a function to initialize the weights for Stochastic gradient descent or any other optimizer like ‘ADAM’ What is an optimizer? we will get to the part in a few seconds.
Activation = ‘relu’ stands for the rectified linear unit is the rectifier to create and measure the non-linearity.
Relu is linear for all positive values and zeroes for all negative values. The downside for being zero for all negative values is a problem called “dying RELU” . a Relu neuron is “dead” if it’s stuck on the negative side and always outputs 0. The dying problem is likely to occur when the learning rate is too high or there is a large negative bias. ‘Leaky ReLU’ and ‘ELU’ are also good alternatives to try. Other variants include ReLU-6, Concatenated ReLU(CReLU), Exponential Linear(ELU,SELU), Parametric ReLU.
Last one is ‘input_dim’ simply refers to the number of columns(input dimensions)
Further, we will add a second layer the same way we did above, the only difference is we don’t need to add “input_dim” becoz it will learn itself from the first layer the input dimensions value is 30
Activation ‘relu’ is used for regression output and
Activation ‘softmax’ function used when we need multi-class classification output with a Dense value 8 means it has 8 classes.
#compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
This function is actually used to compile all the layers in other words calculate weights(settings) in the neural network.
Optimizer = ‘adam” just like Stochastic Gradient Descent (SGD) optimizes the algorithm to find the optimal set of weights in neural networks using pre-defined kernel_initializer =”uniform” that we set a while ago.
Loss = ‘binary_crossentropy’ is the function used to calculate the loss in accuracy for the Classification problem, for Regression its RMSE (Root Mean Square Error) for Multi-class we use loss = ‘categorical _cros entropy’
Metric== [‘accuracy’] is again another function to display the accuracy of the model.
estimator = KerasClassifier(build_fn=baseline_model, epochs=350, batch_size=1, verbose=1)
Finally, we will use our model function with wrapper function ‘KerasClassifier where ‘build_fn’ refers to the model function, epochs = 350 refers a number of iteration that will be used to train the model, batch_size = 1 refers to the data batch size to be used at a time to train our model, verbose = 1 it just displays the process during the training of our model.
It's time to fit out the model with X_train and y_train..done!
estimator.fit(X_train,Y_train)
Wow now we have an accuracy of 99% The highest of all, that’s why deep learning is very famous for non-linearly separable data. However, tuning deep learning models can be a bit difficult as it has lots of parameters to tune but we can also use GRID Search to automate and find the best optimal parameters for us for each case scenario. Search for Grid Search from my profile if wish to see in detail ‘how to use Grid Search for Deep Learning.
Our model is ready to predict new data.
estimator.predict(X_test)
predictions = estimator.predict(X_test)
print(predictions)
The values from 1 to 7 are the predictions of classes
Done.. we can also use Kfold to cross-validate our model
kfold = KFold(n_splits=20, shuffle=True)
results = cross_val_score(estimator, X_train, Y_train, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
The whole code will look something like this.
#ANNimport numpy as np
import matplotlib.pyplot as plt
import pandas as pd# Importing the dataset
dataset = pd.read_csv('glass.data', header = None,index_col = 0)#drop the missing values
dataset = dataset.dropna()
# Statistical summary of the variables
dataset.describe()X = dataset.iloc[:,0:9].values
y = dataset.iloc[:, 9].values# Check for class imbalance
dataset.groupby(y).size()# encode class values as integers
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
encoder.fit(y)
encoded_Y = encoder.transform(y)from keras.utils import np_utils
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(y)# fix random seed for reproducibility
seed = 7# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, dummy_y, test_size=0.33, random_state=seed)#fearure scaling/Normalization
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)#Create ANN
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifierfrom sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.pipeline import Pipeline#define baseline model
def baseline_model():
# create model
model = Sequential()
model.add(Dense(30, input_dim=8, activation='relu'))
model.add(Dense(8, activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
estimator = KerasClassifier(build_fn=baseline_model, epochs=350, batch_size=1, verbose=1)estimator.fit(X_train,Y_train)estimator.predict(X_test)
predictions = estimator.predict(X_test)
print(predictions)#-----------------------------------------------------
kfold = KFold(n_splits=20, shuffle=True)
results = cross_val_score(estimator, X_train, Y_train, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
Congratulations! we have completed all. It's a long blog, I tried to keep it as short as possible keeping the important concepts intact.
I hope you have enjoyed
Feel Free to ask because “Curiosity Leads To Perfection”
Some of my alternative internet presences are Facebook, Instagram, Udemy, Blogger, Issuu, and more.
Also available on Quora @ https://www.quora.com/profile/Rupak-Bob-Roy
Stay tuned for more updates.! have a good day….
~ Be Happy and Enjoy!
Don’t forget to follow The Lean Programmer Publication for more such articles, and subscribe to our newsletter tinyletter.com/TheLeanProgrammer