Week #5- Modeling Earthquake Damage

Published in

bbm406f19

4 min readDec 30, 2019

Image from https://www.quantamagazine.org

Previously on our Project

Last time we discovered the performance of Logistic Regression and Naive Bayes Classifier on our dataset.
They didn’t perform well, compared to our first approach -kNN Classifier- because of the assumption that the features were independent and coming from a normal distribution. However, this assumption was wrong, the features were dependent and they did were not coming from a normal distribution. So we decided to get away from these approaches.

This week on Modeling Earthquake Damages

As mentioned, our best approach was the kNN classifier with 71% accuracy so far. However, we think we can do better with Neural Networks, which provide more complex models since the dataset is large enough (as both sample numbers and feature dimensions) to carry such complexities.

Usage of Neural Networks

We plan to begin with a simple neural network model with 1 hidden layer.
The first checkpoint is to choose the number of units in the hidden layer. In this step the formula:

helps us.
According to this formula, we find the proper number as approximately 1014 when alpha selected as 5 and since neural networks work better with values that are powers of 2, we determined using 1024 nodes in our single hidden layer.

Experiment 1- Use a Neural Network with 1 Hidden Layer

In this model, we determined the batch size as 128 and the learning rate as 0,001.
Here are the corresponding “Loss Plots” for “Train-Test” sets.

Conclusion:
As seen, in the first epoch the loss is very high since the weights are randomly initialized. With the second epoch, the losses have drastically decreased as expected however in the epochs in the interval (3,10); the change in loss is not satisfying. This may be because the model is not complex enough, but there is another issue here. Which is the input features are not normalized. Some features range between hundreds when others range between much smaller values. So our next step is to normalize the features and run the model again.

Experiment 2- Use Normalized Features

In this experiment, we normalized the features and ran the previous model on the normalized dataset.
Here are the corresponding plots:

Conclusion:
We can see that the loss values improved in general, dropped from tens to .70s but still, the plot does not look good. Because some of the features are categorical and scaling them does not mean much for a neural network.

The loss curves show that our model has under fitted.
This shows that we need to increase the complexity of our model.

Experiment 3- Try a More Complex Network

In the next step, we ran the model with 4 hidden layers, which contain 1024–512–256–128 units, respectively.

Conclusion:
This model was way better. We can see the change in train and validation losses clearly and at some point (epoch 7), the model starts to overfit. We are happy to see such a plot because it shows we are on the right way, we somehow reach the optima.

Experiment 4- Reduce Dimensionality

We wanted to investigate the effect of feature dimensionality and for this purpose we reduced the number of features.

Conclusion:

We simplified the dataset by choosing less number of features, we picked 10 most important features from the “Extra Tree Classifier”, which we used in our last blog.

However it did not increase the performance, caused overfitting.

Here are some main reasons for overfitting:
1- Too complex model
2- Less number of samples
3- Curse of dimensionality.
4- Noisy data

Which one to play with?
Not 1. A neural network with less number of layers did not perform well in our experiments.
Not 2. There are already 200.000 train samples.
Curse of Dimensionality:
We saw that we should use all the features and try them on our model.
Noisy Data:
Will be mentioned in next week’s blog.

General Conclusion:

Finally in Experiment 3, the train and validation losses are decreasing as expected, the next goal is to get faster converge.

Next week on Modeling Earthquake Damages:

From now on, our main challenge is to optimize the hyperparameters such as learning rate, batch size, optimizer(ADAM, SGD, etc.).

As we mentioned before, when we look at the distribution of labels there is a huge class imbalance problem. To solve this issue, we will try different loss functions, some post-processing techniques.

Authors: Mustafa Sercan AMAC | Beyza CEVIK | Mert COKELEK