Do Earthquakes Follow A Pattern? (Part 1)

Kamil DİLBAZ
Datajarlabs
Published in
5 min readNov 18, 2019
Do Earthquakes Follow A Pattern?
(Photo by Stefan Hinman / Matanuska-Susitna Borough)

Have you ever think of that earthquakes can be somehow related to each other? Is one quake triggered by the other? We know about little aftershock waves, but I mean bigger ones which have more than 4 magnitude score. I have been wondering the answer of this question for a while. This is the reason of this study.

It is too hard to predict earthquakes. There are many parameters to be included in. Huge data from enormous quantity of sensors must be processed.

Most of earthquake prediction efforts focus on predicting the next few seconds to warn people. Many instant-measured parameters are used. On any changes of the state, it is tried to predict whether there will be an earthquake or not.

Up to September 2019, accuracy score of that prediction was 58%. Since then, it is 84%. All efforts focus on very near future such as 6–9 seconds to be able to alarm people just before.

Along with this study, I am going to try to predict earthquakes, to make predictions as closer as possible to real values and to find out whether there are any relations with former earthquakes or it follows any patterns.

In my country -Turkey, there are some different institutes registering earthquake data. One on them is DEMP (Disaster and Emergency Management Presidency) whose website I downloaded my dataset from. For those who are interested, here is the link: Earthquake Catalog

My dataset includes all quakes in Turkey since 1900 with magnitude ≥4. So, I have 6574 data points which are sufficient for the study. My data map is here:

Earthquake happenings in Turkey with magnitude score ≥ 4 since 1900.
Figure 1: Earthquake happenings

And the dataset table seems like this:

As you see, there are many unnecessary or convertible columns in the dataset table. Also some values are missing, but missing values are in unnecessary columns. So, it is not important. The dataset is not suitable for analysis in this state. For analysis preparation, I do these:

1. Date and time data is stored in string. I separate them.

  • With date; I calculate time gaps between sequential quakes and days passed since 1 Jan, 1900. Also, I use separate month values and use it as a new feature. By that way, there produce 3 new features: ‘Days’, ‘Time Gap’, ‘Month’.
  • With time; it is better to calculate the ratio of minutes passed in a day.

2. Data points are sorted descending. I sort ascending by ‘Days’.

3. Constant depth is a categorical feature stored as ‘ - ’ and ‘ * ’.

  • ‘ - ’ means ‘no’.
  • ‘ * ’ means ‘yes’.

They should be converted to zeros and ones to use more efficiently.

4. There are some unnecessary / useless columns (no, time, reference, source, explanation, type and placement columns). I drop them.

While looking through the website for more information, I find this risk map:

Turkey’s Earthquake Risk Map by DEMP
Figure 3: Turkey’s Earthquake Risk Map

On the scale which is shown with yellow and red gradients, if we go to right, quake risk increases. When I compare both maps, quakes with bigger than 6 magnitude score mostly overlaps with the highest risk grades. This lead me to think of finding a way to use the map in my model. To achieve this, I follow these steps:

  1. Cropping unnecessary parts of the map,
  2. Filling lakes and The Marmara Sea with neighboring colors. Earthquake risk still remain, but they are not colored as risky because of there are no residents on them.
  3. Reducing noises such as city names and borders. At this step, an extending-sized frame passes over the map several times equalizing all values to frame minimum/maximum.
  4. Finding threshold values for risk grades and replacing all values with a grade number on the map. Threshold values are found experimentally.

Map results of these steps seem like these:

Risk map conversion steps.
Figure 4: Risk map conversion steps

A function converts coordinate values to pixel values. By this, risk grade can be read from the map according to coordinates.

At the end of feature engineering, I have 10 features in my dataset.

Last state of dataset
Figure 5: Last state of dataset

Occasionally, features need extra transformation such as taking logarithm or square root. This can make features fit better for the target. So, I take logarithms of all and want to see relations between pairs. I drew 171 scatter plots for this and a heatmap for all features. I find some weak correlations, but intend to use them even so. I prepare a graphic to understand correlations better. Below is the graph.

Figure 6: Feature relations
  • Red lines represent inverse correlations,
  • Green lines represent direct correlations,
  • Blue lines represent mixed (first begins direct and then turns into inverse) correlations.

Up to now, I have been explaining my dataset and feature engineering I have made. In the next part, I am going to explain my deep learning models and results I have explored.

Keep on reading with part 2

For further information and codes, you can review my GitHub repository: Earthquake modeling

If you like this post, please hit the clap button on left as much as you can. And also, you can share it.

--

--

Kamil DİLBAZ
Datajarlabs

Computer Scientist | Data Scientist | AI Enthusiast