Rock v/s Mine Prediction Using ML models

Harsh Vardhan
5 min readFeb 11, 2022

--

This was my first Machine learning project(Hello World of ML :)).

The data set was used by Gorman and Sejnowski in their study
of the classification of sonar signals using a neural network . The
project is to train a network to discriminate between sonar signals bounced
off a Mines and those bounced off a rock. Though the dataset is small but still it has approximately 60 features(attributes).This is a good project to start with, because

  • It is a classification problem, allowing you to practice with perhaps an easier type of supervised learning algorithm.
  • It is a multi-class classification problem (multi-nominal) that may require some specialized handling.
  • It only has 60 attributes and 207 rows, meaning it is small and easily fits into memory .
  • All of the numeric attributes are in the same units and the same scale, not requiring any special scaling or transforms to get started.

Overview of what I will cover in this entire End to End small Machine Learning project,

1.Importing useful libraries
2.Collection of Data
3.Data Preprocessing
4.Split the data into Test Data and Train Data
5.Make Trained ML Models
6.Feed Test Data into our Trained ML Models to predict

#Importing the librariesimport pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
%matplotlib inline
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

This means the next time you want to use their libraries, you can simply use their acronym. For instance, to use pandas, you can simply type pd and those acronyms can be set as whatever you want. Numpy is useful to construct matrices, while pandas is for data cleansing or exploration, seaborn and matplotlib.

After importing those libraries, you will need to import the Sonar dataset to your notebook by using following pandas function:

df=pd.read_csv('sonar data.csv',header=None)
df.head()

Seperating Data and Labels

a=df.drop(columns=60,axis=0)
b=df[60]

Splitting the data into Test data and Training Data using the ‘train_test_split’ with the test sample 10% of the total data.

a_train,a_test,b_train,b_test=train_test_split(a,b,test_size=0.1,stratify=b,random_state=1)a_train.shape
a_test.shape
b_train.shape
b_test.shape

Training the models

  1. Using Logistic Regression Model
model=LogisticRegression()
model.fit(a_train,b_train)

Now we will check the accuracy of the model

a_train_prediction=model.predict(a_train)
a_train_accuracy=accuracy_score(a_train_prediction,b_train)
print('The accuracy of training data is :',a_train_accuracy)a_test_prediction=model.predict(a_test)
a_test_accuracy=accuracy_score(a_test_prediction,b_test)
print('The accuracy of test data is :',a_test_accuracy)

Our Test data is showing 76% accuracy,as it is a small data .Now I have build a predictive system which can detect rock and mine.

#input_data is the data we should provide to test our system.
input_data=(0.0392,0.0108,0.0267,0.0257,0.0410,0.0491,0.1053,0.1690,0.2105,0.2471,0.2680,0.3049,0.2863,0.2294,0.1165,0.2127,0.2062,0.2222,0.3241,0.4330,0.5071,0.5944,0.7078,0.7641,0.8878,0.9711,0.9880,0.9812,0.9464,0.8542,0.6457,0.3397,0.3828,0.3204,0.1331,0.0440,0.1234,0.2030,0.1652,0.1043,0.1066,0.2110,0.2417,0.1631,0.0769,0.0723,0.0912,0.0812,0.0496,0.0101,0.0089,0.0083,0.0080,0.0026,0.0079,0.0042,0.0071,0.0044,0.0022,0.0014)
input_data_Array=np.asarray(input_data)
input_data_reshaped=input_data_Array.reshape(1,-1)
predict=model.predict(input_data_reshaped)
if predict[0]=='R':
print('Safe,Its just a Rock')
else:
print('DANGER,its MINE')

The output by providing a input data is:

2.Naive Bayes Model

from sklearn.naive_bayes import GaussianNB
from sklearn import metrics
gnb = GaussianNB()
gnb.fit(a_train, b_train)

Now we will check the accuracy of the Naive Bayes model

a_pred_train = gnb.predict(a_train)
print("Gaussian Naive Bayes model training data accuracy(in %):", metrics.accuracy_score(a_pred_train, b_train)*100)
a_pred_test = gnb.predict(a_test)
print("Gaussian Naive Bayes model accuracy(in %):", metrics.accuracy_score(a_pred_test, b_test)*100)

The accuracy of this model is less than Logistic Model,here we will build a predictive system and will see the output.

input_data=(0.0392,0.0108,0.0267,0.0257,0.0410,0.0491,0.1053,0.1690,0.2105,0.2471,0.2680,0.3049,0.2863,0.2294,0.1165,0.2127,0.2062,0.2222,0.3241,0.4330,0.5071,0.5944,0.7078,0.7641,0.8878,0.9711,0.9880,0.9812,0.9464,0.8542,0.6457,0.3397,0.3828,0.3204,0.1331,0.0440,0.1234,0.2030,0.1652,0.1043,0.1066,0.2110,0.2417,0.1631,0.0769,0.0723,0.0912,0.0812,0.0496,0.0101,0.0089,0.0083,0.0080,0.0026,0.0079,0.0042,0.0071,0.0044,0.0022,0.0014)
input_data_Array=np.asarray(input_data)
input_data_reshaped=input_data_Array.reshape(1,-1)
predict=gnb.predict(input_data_reshaped)
print(predict)
if predict[0]=='R':
print('Safe,Its just a Rock')
else:
print('DANGER,its MINE')

The output is different from the logistic model,so we will check for more model to come to conclusion.

3.KNN Model

from sklearn.neighbors import KNeighborsClassifier  
from sklearn import metrics
classifier= KNeighborsClassifier(n_neighbors=5, metric='minkowski', p=2 )
classifier.fit(a_train, b_train)

Checking the accuracy:

#Predicting the result  
a_pred_train= classifier.predict(a_train)
print("KNN model accuracy of training data is (in %):", metrics.accuracy_score(b_train, a_pred_train)*100)
a_pred_test= classifier.predict(a_test)
print("KNN model accuracy of test data is (in %):", metrics.accuracy_score(b_test, a_pred_test)*100)

Building the predictive system to see the outputs and compare with others:

input_data=(0.0392,0.0108,0.0267,0.0257,0.0410,0.0491,0.1053,0.1690,0.2105,0.2471,0.2680,0.3049,0.2863,0.2294,0.1165,0.2127,0.2062,0.2222,0.3241,0.4330,0.5071,0.5944,0.7078,0.7641,0.8878,0.9711,0.9880,0.9812,0.9464,0.8542,0.6457,0.3397,0.3828,0.3204,0.1331,0.0440,0.1234,0.2030,0.1652,0.1043,0.1066,0.2110,0.2417,0.1631,0.0769,0.0723,0.0912,0.0812,0.0496,0.0101,0.0089,0.0083,0.0080,0.0026,0.0079,0.0042,0.0071,0.0044,0.0022,0.0014)
input_data_Array=np.asarray(input_data)
input_data_reshaped=input_data_Array.reshape(1,-1)
predict=classifier.predict(input_data_reshaped)
print(predict)
if predict[0]=='R':
print('Safe,Its just a Rock')
else:
print('DANGER,its MINE')

The output is same as Logistic Model.

4.Decision Tree

#Fitting Decision Tree classifier to the training set  
from sklearn.tree import DecisionTreeClassifier
classifier= DecisionTreeClassifier(criterion='entropy', random_state=0)
classifier.fit(a_train, b_train)

Checking the accuracy of Decision Tree model of both test data and training data:

#Predicting the result  
a_pred_train= classifier.predict(a_train)
print("Decision tree model accuracy of taining data is (in %):", metrics.accuracy_score(b_train, a_pred_train)*100)
a_pred_test= classifier.predict(a_test)
print("Decision tree model accuracy of test data is (in %):", metrics.accuracy_score(b_test, a_pred_test)*100)
input_data=(0.0392,0.0108,0.0267,0.0257,0.0410,0.0491,0.1053,0.1690,0.2105,0.2471,0.2680,0.3049,0.2863,0.2294,0.1165,0.2127,0.2062,0.2222,0.3241,0.4330,0.5071,0.5944,0.7078,0.7641,0.8878,0.9711,0.9880,0.9812,0.9464,0.8542,0.6457,0.3397,0.3828,0.3204,0.1331,0.0440,0.1234,0.2030,0.1652,0.1043,0.1066,0.2110,0.2417,0.1631,0.0769,0.0723,0.0912,0.0812,0.0496,0.0101,0.0089,0.0083,0.0080,0.0026,0.0079,0.0042,0.0071,0.0044,0.0022,0.0014)
input_data_Array=np.asarray(input_data)
input_data_reshaped=input_data_Array.reshape(1,-1)
predict=classifier.predict(input_data_reshaped)
print(predict)
if predict[0]=='R':
print('Safe,Its just a Rock')
else:
print('DANGER,its MINE')

In all the above model the input data is same and 3 out of 4 models predicted the expected result and KNN being the highest in prediction of accuracy score and Nawive Bayes being the lowest and with different output from others.

Future Work:

I will try to use Ensemble techinuqes (Bagging and Boosting) for more accuracy and also try larger datasets with that might challenge data pre-processing technique.Will keep updating about more Data Science Projects.

Feel free to comment your feedback.Thanks

--

--