Forest Fire Prediction with Artificial Neural Network (Part 1)

Published in

Brandon Lammey Intro to AI

9 min readApr 3, 2019

Retrieved from: http://www.intechopen.com/source/html/39067/media/image1.png

Building off one of my previous posts — A simple look into Deep Learning: Setting up an Artificial Neural Network — I decided to create and test an ANN with a regression and classification problem. The difference between regression and classification can be a little tricky to determine which algorithm to use for prediction. The most fundamental difference between the two is that classification is used to predict a label and regression is used to predict a quantity. This simply means that a classification problem generally predicts a discrete label and outputs a prediction probability while regression will predict a continuous quantity and provide a value.

Data

UCI Forest Fire Dataset

Sample of Dataset

Classification

Output is classified into one of two or more classes.
Input variables are real-valued or discrete
Common to predict a continuous value as the probability of a given value belonging to each output class.
Skill of a classification predictive model is most commonly calculated using classification accuracy or the correct percentage of predictions made
- Common method: Confusion matrix
- More accurate: CAP

Regression

Output is the prediction of a quantity.
Input variables are real-valued or discrete
Skill of the model must be reported as an error in those predictions
- Common method: root mean squared error

Steps

For this project I will be using data received from the UCI Machine Learning Repository and use the same data set to address a classification problem and a regression problem. The following steps will be used for both

Data Cleaning and Processing
Neural Net Creation
Train the ANN
Test the ANN
Methods of Improvement will be discussed in Part 2

Classification

For classification we attempt to group things by shared features. For instance, given a set of animals we want to classify mammals and reptiles. One feature we might look at is the existence of hair/ fur or the existence of scales. If an animal has fur we might classify that as a mammal. I speak more in depth on specific algorithms in my article Machine Learning Classification Models (Part I). The goal of this specific model will be able to the severity of a fire. The dataset did not consist of data points which would fit a classification of determining whether a fire would occur or not since all the data would only fall into the fire category in that case.

Data Cleaning and Processing

There were no blank or invalid cells in the data set I selected but if there was, a script could be written to remove those rows. The following steps were used to preprocess the data

Define independent and dependent variables
- Ensure dependent variables are categorical using a fire classification metric
Encode categorical features and create dummy variables (avoid dummy trap)
Split Data
Scale Features to Optimize

‘’’
 Data Cleaning and Preprocessing
‘’’# Importing the libraries
import numpy as np
import pandas as pd# Importing the dataset
dataset = pd.read_csv(‘forestfires.csv’)
#Getting Independent and Dependent Features
X = dataset.iloc[:, 0:12].values # independent
y = dataset.iloc[:, 12].values # dependent variable'''
Classification
#Convert to Acres then Classify Size
Class 1.A - one acre or less;
Class 2.B - more than one acre, but less than 10 acres;
Class 3.C - 10 acres or more, but less than 100 acres;
Class 4.D - 100 acres or more, but less than 300 acres;
Class 5.E - 300 acres or more, but less than 1,000 acres;
Class 6.F - 1,000 acres or more, but less than 5,000 acres;
'''
y = dataset.iloc[:, 12].values
for i in range(0, len(y)):
    y[i] = (y[i]*2.47)
    if y[i] < 1.0:
        y[i] = 1
    elif y[i] < 10.0:
        y[i] = 2
    elif y[i] < 100.0:
        y[i] = 3
    elif y[i] < 300.0:
        y[i] = 4
    elif y[i] < 1000.0:
        y[i] = 5
    elif y[i] < 5000.0:
        y[i] = 6
    else:
        y[i] = 7# Encoding categorical data for independent variables 
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 2] = labelencoder_X_1.fit_transform(X[:, 2]) #For month
labelencoder_X_2 = LabelEncoder()
X[:, 3] = labelencoder_X_2.fit_transform(X[:, 3]) #For weekdayonehotencoder = OneHotEncoder(categorical_features = [2])#dummy variable for month
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:] #avoid dummy variable trap 
onehotencoder = OneHotEncoder(categorical_features = [13])#dummy variable for week
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:] #avoid dummy variable trap'''Encoding For Classification'''
from keras.utils import np_utils
y = np_utils.to_categorical(y)# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)# Feature Scaling to optimize 
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Neural Net Creation

I created a sequence of layers to define the Neural Net and defined each layer by initializing weights, defining an activation function, and selecting the nodes per hidden layer. I selected the nodes based on a the calculation (independent + dependent) /2. A probability will be outputted and then this was given a split at 0.5 to determine whether the given category would result in a TRUE or FALSE value with a total of seven categories. This was then compiled. The activation function for the output is a sigmoid to represent the binary values, softmax was used to capture multiple dependent categories.

'''
    Creating the ANN
'''
# Importing the Keras libraries and packages to use Tensor Flow Backend
import keras
from keras.models import Sequential #For Initializing ANN
from keras.layers import Dense #For Layers of ANN# Initializing the ANN with sequence of layers (Could use a Graph)
#Classifier Model
classifier = Sequential()# Adding the input layer and the first hidden layer
classifier.add(Dense(units = 17, kernel_initializer = ‘uniform’, activation = ‘relu’, input_dim = 27))# Adding the hidden layers
classifier.add(Dense(units = 17, kernel_initializer = ‘uniform’, activation = ‘relu’))
classifier.add(Dense(units = 17, kernel_initializer = ‘uniform’, activation = ‘relu’))# Adding the output layer
# Probability for the outcome 
‘’’Classification’’’
classifier.add(Dense(units = 7, kernel_initializer = ‘uniform’, activation = ‘softmax’))# Compiling the ANN
'''Classification'''
#Another Option: categorical_crossentropy
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

Train the ANN

The amount of Epochs I selected was determined by an amount large enough to see improvements occurring but not so large as to result in a long training time. I chose to do a combination technique of batch and stochastic gradient descent by selecting a batch size as a small portion of the total batch but larger than a single unit. In the printout, the accuracy can be seen.

# Fitting the ANN to the Training set
classifier.fit(X_train, y_train, batch_size = 5, epochs = 100)

Test the ANN

I used the model on the test set and the compared the predicted values from the model to the actual values and outputted the results in a confusion matrix.

Final Result:

Model: 91%
Confusion Matrix: 71%
K Fold Cross Validate: 86%

There are some inconsistencies with the predicted accuracy and validation as well as with the test results. All result in a fairly acceptable level of accuracy but there may be some issues with the model or data set size that I will look at in Part 2.

Regression

A regression problem attempts to predict a continuous variable. This could mean predicting the amount of water used in a city, the foot traffic at a mall, or the profits of a company. For the selected dataset, I will predicting the size of a forest fire based on features such as geospatial data, wind, temperature, and humidity.

Data Cleaning and Processing

Using the same data set, the process for this stage in a regression model is relatively the same.

Define independent and dependent variables
- Ensure dependent variables are continuous
Encode categorical features and create dummy variables (avoid dummy trap)
Split Data
Scale Features to Optimize

‘’’
 Data Cleaning and Preprocessing
‘’’# Importing the libraries
import numpy as np
import pandas as pd# Importing the dataset
dataset = pd.read_csv(‘forestfires.csv’)
#Getting Independent and Dependent(regression and categorical) Features
X = dataset.iloc[:, 0:12].values # independent
y = dataset.iloc[:, 12].values # dependent variable# Encoding categorical data for independent variables 
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 2] = labelencoder_X_1.fit_transform(X[:, 2]) #For month
labelencoder_X_2 = LabelEncoder()
X[:, 3] = labelencoder_X_2.fit_transform(X[:, 3]) #For weekdayonehotencoder = OneHotEncoder(categorical_features = [2])#dummy variable for month
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:] #avoid dummy variable trap 
onehotencoder = OneHotEncoder(categorical_features = [13])#dummy variable for week
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:] #avoid dummy variable trap# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)# Feature Scaling to optimize 
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Neural Net Creation

The creation of the sequence of layers will be similar but a continuous value should be outputted rather than a class since the actual size of a fire is a continuous variable. Due to this fact, the feature does not need to be split into multiple columns and the activation function for the output will be linear for this case and the accuracy is measured differently for a regression model.

# Importing the Keras libraries and packages to use Tensor Flow Backend
import keras
from keras.models import Sequential #For Initializing ANN
from keras.layers import Dense #For Layers of ANN# Initializing the ANN with sequence of layers (Could use a Graph)
#Classifier Model
classifier = Sequential()# Adding the input layer and the first hidden layer 
classifier.add(Dense(units = 14, kernel_initializer = ‘uniform’, activation = ‘relu’, input_dim = 27))# Adding the hidden layers
classifier.add(Dense(units = 14, kernel_initializer = ‘uniform’, activation = ‘relu’))
classifier.add(Dense(units = 14, kernel_initializer = ‘uniform’, activation = ‘relu’))# Adding the output layer
'''Regression'''
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'linear'))

Train the ANN

Beginning the training is the same for the regression case except instead of accuracy percentage, the model will output a Mean Squared Error.

# Fitting the ANN to the Training set
classifier.fit(X_train, y_train, batch_size = 5, epochs = 500)

Test the ANN

Testing a regression model differs from a classification since the error is not measured in the same way with correct or incorrect class predictions. Instead, I can graph the test sets predicted and real results for the size of the forest fire and use Mean Squared Error, Mean Absolute Error, Root Mean Squared Error, and/ or R² values and compare.

In addition the the graph I was able to output the model accuracy with MAE= 7.08 and MSE= 1430 and then calculate R² of the model as R² = -0.33.

From the combination of these results it is apparent that the model is an extremely poor predictor for the size. This will be addressed in Part II by parameter tuning, selecting variables using a correlation matrix, and dropping nodes in the network randomly to avoid over fitting data.

Future Work

In Part 2, I plan to use parameter tuning to determine the best parameters to use for the model including batch size, epoch, and optimizer in addition to looking at the structure of the neural net, feature correlation, and a method of preventing overfitting data.