WEEK #3 — Modeling Earthquake Damage

Beyza Cevik
bbm406f19
Published in
2 min readDec 15, 2019

Naive Bayes & Logistic Regression

We are Sercan Amac, Mert Cokelek, Beyza Cevik. This is our third blog post about modeling earthquake damage. In previous weeks, we analyzed the dataset and implemented the K-NN algorithm on the dataset.

Photo by Owen Lystrup on Unsplash

What has been discovered so far?

There are several important things we find during our experiments. The most important thing is that the dataset is biased towards the medium-damaged level class. That means we have a class imbalance problem.

What is new this week?

To solve class imbalance problem we can try different sampling strategies and different loss functions but this week we will be focusing on Discriminative vs Generative models. To do that we will compare 2 algorithms Logistic Regression and Gaussian Naive Bayes.

Firstly we split the data into training and test sets using scikit_learn’s train_test_split function. Then we ran both of the algorithms from scikit-learn again. The test accuracy for logistic regression is %58and for Gaussian Naive Bayes %42.

Analysis Of Results

Both algorithms performed worse than K-NN. The main reason behind this is the class imbalance problem. For the logistic regression algorithm finding a linear decision boundary is very hard and the same applies to Naive Bayes.

Logistic regression performed better than Gaussian Bayes? Why?

Firstly the main reason is we assume features are coming from a normal distribution. However, some features are just binary and coming from a Bernoulli distribution. And most of the features are dependent on each other since they define a building and almost every part of a building is dependent on another. You may ask “Why logistic regression is not affected by independence assumption while Naive Bayes does?”. It is because when we derive logistic regression we make the same independence assumption but while training it, it does not take this assumption into account, so one of the other reasons is that logistic regression is not tied to independence assumption. For this problem, the winner is DISCRIMINATIVE MODELS!.

Next week we will train a neural net!

Previous Blog Posts:

Week #1 — Modeling Earthquake Damage

Week #2 — Modeling Earthquake Damage

--

--