Which Model Is Best For Smoke Detection?

6 min readApr 2, 2023

Exploring all the models to pick the best.

Introduction :

Smoke detectors detect smoke and trigger an alarm to alert others. Typically, they are found in offices, homes, factories, etc. Generally, smoke detectors fall into two categories:

Photoelectric Smoke Detector - The device detects the light intensity and generates an alarm if it falls below a set threshold value since smoke causes the light intensity to decrease due to dust particles and smoke.
Ionization Smoke Detector - A detector of this type is equipped with an electronic circuit that measures the current difference and alerts the user if it exceeds a certain threshold. As the ions cannot move freely due to smoke and duct particles, the current in the circuit will decrease.

Using the provided dataset, we aim to develop an AI model that can accurately raise an alarm if smoke is detected. It is our objective to compare many Classification Models, such as KNN, Logistic Regression, etc., based on their accuracy, represent them visually, and select the best from them.

The data is taken from here.

Importing Required Libraries :

#Importing all essential libraries
import numpy as np
import pandas as pd
import seaborn as sns
from plotly.subplots import make_subplots
import matplotlib.pyplot as plt
import plotly.express as px
import missingno as msno

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

#Importing Models
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression,SGDClassifier
from sklearn.ensemble import RandomForestClassifier,GradientBoostingClassifier,AdaBoostClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.dummy import DummyClassifier
from sklearn.tree import ExtraTreeClassifier 


from sklearn.metrics import accuracy_score
import time

import warnings
warnings.filterwarnings('ignore')

Data Exploration :

Feature Description:

UTC - The time when the experiment was performed.
Temperature - Temperature of Surroundings. Measured in Celsius
Humidity - The air humidity during the experiment.
TVOC - Total Volatile Organic Compounds. Measured in ppb (parts per billion)
eCo2 - CO2 equivalent concentration. Measured in ppm (parts per million)
Raw H2 - The amount of Raw Hydrogen present in the surroundings.
Raw Ethanol - The amount of Raw Ethanol present in the surroundings.
Pressure - Air pressure. Measured in hPa
PM1.0 - Particulate matter of diameter less than 1.0 micrometer.
PM2.5 - Particulate matter of diameter less than 2.5 micrometers.
NC0.5 - Concentration of particulate matter of diameter less than 0.5 micrometers.
NC1.0 - Concentration of particulate matter of diameter less than 1.0 micrometers.
NC2.5 - Concentration of particulate matter of diameter less than 2.5 micrometers.
CNT - Simple Count.
Fire Alarm - (Reality) If the fire was present then the value is 1 else it is 0.

data = pd.read_csv('../input/smoke-detection-dataset/smoke_detection_iot.csv',index_col = False)
data.head()

First five rows of the data(Source: Author)

data.shape

data.describe().T.sort_values(ascending = 0,by = "mean").\
style.background_gradient(cmap = "BuGn")\
.bar(subset = ["std"], color ="red").bar(subset = ["mean"], color ="blue")

#Getting all the unique values in each feature
features = data.columns
for feature in features:
    print(f"{feature} ---> {data[feature].nunique()}")

Unique Values for all variables (Source: Author)

Null Value Distribution:

data.isna().sum()

msno.matrix(data)

Null Value Visualization (Source: Author)

Data Cleaning:

There is no missing value in the dataset, which allows us to analyze the data much more effectively and build accurate prediction models.

If the dataset contains missing values, please refer to the following links to assist you with data cleaning:

Although some features are useless and can hamper our model. Those are :

UTC - It merely indicates when the experiment was conducted, so it does not affect the results.
Unnamed :0 - It's just the indexing.
CNT - It is the count (similar to indexing).

Since these attributes are useless, we will drop them.

del_features = ['Unnamed: 0','UTC','CNT']
for feature in del_features:
    data = data.drop(feature,axis = 1)
data.head()

Deleting Unwanted Features (Source: Author)

⭐Important Observations :

There is a total of 62360 rows and 16 columns in the data.
The data doesn't contain any missing values.
We drop UTC, Unnamed 0:, CNT attributes as they are of no use to us.
After all the modifications we have a total of 13 attributes on which we will perform EDA.
There are a total of 810680 (62360 x 13) observations.

Exploratory Data Analysis :

Feature Analysis Using Target Variable :

sns.set_style("whitegrid")
sns.histplot(data['Fire Alarm'])

plt.figure(figsize = (6,6))
sns.kdeplot(data = data,x = 'TVOC[ppb]')

Probability Density Function (Source: Author)

HeatMap :

plt.figure(figsize = (12,12))
sns.heatmap(data.corr(),annot = True,cmap = 'GnBu')

⭐Important Observations :

Considering correlation >=0.65 as high, we can say that Pressure and Humidity have a high correlation.
All the PM's and NC's have a high correlation with each other.
The difference between the mean and median of TVOC,PM's and NC's is very high. This tells us that there are many outliers present.
The TVOC,PM's and NC's are very important attributes for classification because the difference between the mean and median of the target variable is very large.

Modeling:

Data Pre-Processing:

X = data.copy()
X.drop('Fire Alarm',axis = 1,inplace = True)
y = data['Fire Alarm']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,random_state=0)
ss = StandardScaler()
X_train = ss.fit_transform(X_train)
X_test = ss.transform(X_test)

Model Implementation:

models = [KNeighborsClassifier(),SGDClassifier(),LogisticRegression(),RandomForestClassifier(),
GradientBoostingClassifier(),AdaBoostClassifier(),BaggingClassifier(),
SVC(),GaussianNB(),DummyClassifier(),ExtraTreeClassifier()]

Name = []
Accuracy = []
Time_Taken = []
for model in models:
    Name.append(type(model).__name__)
    begin = time.time()
    model.fit(X_train,y_train)
    prediction = model.predict(X_test)
    end = time.time()
    accuracyScore = accuracy_score(prediction,y_test)
    Accuracy.append(accuracyScore)
    Time_Taken.append(end-begin)

Dict = {'Name':Name,'Accuracy':Accuracy,'Time Taken':Time_Taken}
model_df = pd.DataFrame(Dict)
model_df

Accuracy and Time Taken (Source: Author)

Accuracy vs Model:

model_df.sort_values(by = 'Accuracy',ascending = False,inplace = True)
fig = px.line(model_df, x="Name", y="Accuracy", title='Accuracy VS Model')
fig.show()

Time Taken vs Model:

model_df.sort_values(by = 'Time Taken',ascending = False,inplace = True)
fig = px.line(model_df, x="Name", y="Time Taken", title='Time Taken VS Model')
fig.show()

Conclusion:

As a result of the above analysis, we can see that ExtraTreeClassifier requires less amount of training and execution time, as well as provides the highest level of accuracy.

👋 Greetings!

Thanks for sticking around for the rest of the blog! I hope you had a great time!

I cover all kinds of Data Science & AI stuff…. and sometimes Programming.

To have stories sent directly to you, subscribe to my newsletter.

Get an email whenever Prathamesh Gadekar publishes.

Get an email whenever Prathamesh Gadekar publishes. By signing up, you will create a Medium account if you don’t…

medium.com

Which Model Is Best For Smoke Detection?

Introduction :

Importing Required Libraries :

Data Exploration :

Feature Description:

Null Value Distribution:

Data Cleaning:

⭐Important Observations :

Exploratory Data Analysis :

Feature Analysis Using Target Variable :

HeatMap :

⭐Important Observations :

Modeling:

Data Pre-Processing:

Model Implementation:

Accuracy vs Model:

Time Taken vs Model:

Conclusion:

Get an email whenever Prathamesh Gadekar publishes.

Get an email whenever Prathamesh Gadekar publishes. By signing up, you will create a Medium account if you don’t…

Written by Prathamesh Gadekar