Week 2 — Exploratory Data Analysis

Öner İnce
bbm406f19
Published in
2 min readDec 11, 2019
National Geographic — https://www.nationalgeographic.com/news/2015/04/150425-nepal-earthquake-faults-geology-science

Last week, we briefly talked about our dataset and gave some insights about our road map. This week we will dive more deeply into our data and make some inferences about it.

INTO THE DATASET

As we talked about last week, we will use the 2015 Nepal Earthquake data provided by Nepal’s National Planning Commission. This dataset contains ;

Dataset info — https://eq2015.npc.gov.np/

As seen, this a huge dataset that contains not only buildings, also individual information. However, for this project, we only will deal with building structure information and ignore district and individual information.

Our goal in this project is predicting ‘damage_grade’ label of each building with the use of structure information. For this purpose, initially, let’s see the distribution of ‘damage_grade’ among our dataset :

For each building, we have 42 features to use when predicting this damage grades. However, most of them are categorical variables. So we have to convert them to numeric variables to use in our machine learning models. For this purpose, we need to apply one hot encoding method to these features and we used pandas.get_dummies function to convert categorical variables into dummy variables. After this process, we created a correlation matrix to see the relationship between features so we can decide which features to use in our prediction models:

Correlation matrix — seaborn.heatmap

With the help of this matrix, we are able to understand the patterns and relationships between features of the data. And more importantly, we selected our features related to our target variable which is ‘damage_grade’. We see these relations better with a Diverging Texts graph :

Diverging Texts — matplotlib.pyplot.text

Next week, we will start classifying our buildings using these features and hopefully can get some meaningful results!

Contributors

To be continued…

--

--