Predicting Earthquakes using Machine Learning

Gustavo Martins

Published in

Marionete

10 min readSep 1, 2021

An initial study on ML applications

Update: This PoC was deployed here following instructions from here

Earthquake prediction using Machine Learning. Image by author.

Earthquakes basic concepts
Related studies
Data
Modelling
º Problem statement
º Preprocessing and feature engineering
º Model selection
Results and discussion
Next steps
References

Earthquakes basic concepts

Earthquakes are well-studied events, with plenty of academic studies coverage, so only the basic concepts will be described here.

The majority of seismic activity happens between the movement of lithospheric plates (a.k.a. tectonic plates). This movement accumulates energy in the form of rock stress, and then it is suddenly released.

After the quake happens, it can be determined the location (longitude, latitude, and depth), time, and magnitude. Magnitude is the physical size of the earthquake, and the energy released can also be roughly estimated by converting the moment magnitude [1].

Earthquakes can cause destruction and loss of lives. Not only by the ground shaking event but also by secondary effects such as landslides, fissures, avalanches, fires and tsunamis [2].

Between 1998–2017, earthquakes caused nearly 750,000 deaths globally, more than half of all deaths related to natural disasters. More than 125 million people were affected by earthquakes during this time period, meaning they were injured, made homeless, displaced or evacuated during the emergency phase of the disaster.
- World Health Organization

Building a pre-emptive warning system can greatly increase risk management effectiveness. Being able to prepare for those rare events would help to minimise the harm caused, with actions such as local community alert and government provisioning.

Related studies

To the best of my knowledge, 2 studies try to predict when the next earthquake will happen with machine learning [3][4]. Both conclude that is very difficult to predict the next occurrence, due to its randomness and difficulty to prove that earthquakes follow a specific pattern.

It is important to note that both studies use the table of recorded earthquakes to build the machine learning model. Please refer to the Problem statement sub-section for further discussion.

Other ML applications have also been explored:

A study achieved nice results focusing on predicting aftershock events, which happen after larger ones, and is an important subject since aftershocks cause a lot of damage as well [5]. Some discussions have arisen about the data science methodology used [6][7][8].
Laboratory earthquakes experiments are studied through the application of ML trying to predict time to failure [9][10].
Another paper found patterns in energy signals from low-amplitude seismic waves to the timing of slow slip events [11].
The lateral spreading prediction has been explored [12]. A competition for modelling earthquake damage has also been held [13].
Earthquake detection and phase picking automation [14].

Data

The raw data was sourced from The United States Geological Survey (USGS) earthquake catalog [15]. All worldwide earthquakes from the beginning of records until the end of 2018 were downloaded and later filtered as described below.

The Nazca-South American plate boundary area was selected (latitude between -47º and 7º, and longitude between -85º and -60º).

Years between 1973 and 2018 were selected. Comparing the histograms for the number of earthquakes across dates, there is a clear increase in the number of recorded events, mostly for lower magnitudes. This is most likely due to an increase in the number of seismometers, not an actual increase in the number of quakes.

Histograms of earthquakes. Image by author.

Please refer to the Next steps section for more information.

All the downloaded data used for this study can be found here.

Modelling

Problem statement

Instead of using the table of recorded earthquakes to appraise the final model, a different approach was selected.

If the goal is to build a warning system capable of predicting the earthquake risk for any time period and specific areas, in my opinion, a fairer assessment of results is more complex. We need to replicate a real scenario in our dataset and add the time periods with no seismic events in our time frame. That ensures that we will evaluate the predictions even when there is no earthquake.

The following steps were done to achieve that.

First, the selected area was divided into a 3D grid. The spatial resolution dimensions selected were 10 degrees latitude, 12 degrees longitude, and 100 km depth.

3D grid representation. Made with Excalidraw. Image by author.

Second, the data was grouped time-wise for two periods, thus having two final models. One with 7 days (weekly model) and another with 1 day (daily model). It was also added a range warning of periods, 2 for the weekly model and 3 for the daily model. For example, for the daily model, if an earthquake will happen on Friday, not only Thursday evening it should make an alert, but also Wednesday and Tuesday. For the weekly model , if an earthquake will happen on the third week of a given month, it should make alerts on the first and second weeks.

Lastly, for the actual type of alert, it was chosen to warn for every quake with a magnitude (M) greater than or equal to 5.0. This magnitude level was chosen because it can cause damage if is close to a population centre (not only regarding the latitude-longitude plane but also the depth, being or not close to the surface). Even earthquakes with lower magnitudes have already caused deaths, e.g. 4.9 M Afghanistan 1997 killed 15 people [16], although that is uncommon. Starting from that level, they can become even more destructive, e.g. 5.3 M Tajikistan 1989 killed 274 people [17].

In summary, the data was translated into a time-series binary classification problem for every point in the 3D grid (x-y-z-t).

Problem statement for the daily model. Earthquake usp0000yt6. Image by author.

Please refer to the Next steps section for a discussion on the parameters and type of problem.

Preprocessing and feature engineering

The core of this modelling is to track energy dispersion. Hence all earthquakes, above or not the selected warning level, were transformed into energy, allowing us to group different events on the same 3D grid point (that cannot be done with magnitude: two events of M = 3 are not the same as one of M = 6).

Moment magnitude to energy equation. Source [1]. Image by author.

The following step is to reduce the data according to the problem statement. From that is yielded the energy for each point, for every time period, filling periods with no events with 0 energy.

With a well-formatted x-y-z-t dataset, the feature engineering process can be done. Features created are energy moving averages (periods of 30, 60, 90, 180, 330, and 360), ratios of those M. A., and also the moving average for the neighbours’ 3D data points. Those last features are created to try to capture the relation of energy between close points.

Another feature created is the tracking of days from the last event, an attempt to capture the frequency of events.

The resulting datasets characteristics are displayed here:

* weekly modelBalance: 7.10%
Number of records: 106,265
Number of features: 18* daily modelBalance: 1.76%
Number of records: 744,294
Number of features: 18

Model selection

Since this is a highly imbalanced problem, a better metric to be used is the F-score, which is the weighted harmonic mean between precision and recall. In one of my previous articles [18], I explain the difference between those metrics. Precision is penalised from false alarms, and recall is penalised from missed events.

The data split was 90% train and 10% test.

Datasets split balances. Image by author.

Within the training data, a grid search was performed to find the best models, using a 3-fold cross-validation time-series.

###########################
# linear models
###########################lin_params = {
    'C' : [0.01, 0.1, 1.0],
    'solver': ['lbfgs', 'newton-cg']
}###########################
# random forest models
###########################rft_params = {
    'max_depth': [6, 7, 8],
    'n_estimators': [25, 50, 75, 100, 150],
}###########################
# xgboost models
###########################xgb_params = {
    'max_depth': [5, 6, 7],
    'n_estimators': [15, 25, 35],
}###########################
# additional preprocessing
# all features have the NaN filled and inf values clipped
# for linear models, standard scaling is applied
###########################additional = [None, iforest, kmeans]

Cross-validation results:

All the code can be found here.

Results and discussion

With the best model, the results in the test dataset were determined:

Final results for each model. Results are better than the cross-validation data because in the latter not all 90% training data is used for training, there is a fold for testing. Image by author.

For a Proof of Concept, if put in perspective of the challenge (highly imbalanced), especially for the precision, results are reasonable.

The precision and recall levels achieved for the weekly model are acceptable. Perhaps not suited for alerting the general population of some area. But it can be helpful for either governments or high-risk installations (e.g. nuclear power plants) to plan and be in a better preparedness state.

Confusion matrix and Shapley values are also presented:

Weekly model confusion matrix. Image by author.

Shapley values for the weekly model. Image by author.

Next steps

Because this is an initial study, there are a lot of areas for potential improvement and exploration.

Granularity

Both space and time can be differently discretised, but if it unbalances the model golden source (for example, increasing the spatial resolution) it will perform most likely worse. That also includes the range warning.

The dataset balance is probably one of the most important factors.

Problem statement

The alert level of 5.0 can be changed, but if set for higher magnitudes, it will decrease the dataset balance, affecting its performance.

An alternative to a binary classification problem is regression, targeting the energy. Then, if the predicted energy reaches a certain level, an alert is produced.

Energy

Currently, the energy calculation is an approximation, but in reality, it differs depending on the type and level of magnitude. This needs to be analysed if impacts the model.

Adding features

Moving standard deviations are the next batch of features to be tested.

Geomagnetic field

Adding more information, like the magnetic field from IMOs, can perhaps improve the results. The data needs to be sourced, transformed and engineered to verify if there is any hidden pattern that could improve the results.

Regions

Other regions can be explored, and transfer learning is a possibility.

Models

Neural Networks and Autoencoders can be tested. 3D Convolutional Neural Networks are also an option, making the location (x-y-z) part of the architecture.

Hyperparameter search

More advanced and modern hyperparameter search strategies should be deployed, like Bayesian optimisation.

Explainability

XAI can be more explored since here only the basics were covered. More conditional features relations can be inspected, like in here [19].

Sampling

Oversampling and undersampling can also be explored.

GCP pipeline

All steps can be automated, from data sourcing to hyperparameter search, and automated retraining.

References

AI applied for business course

[1] United States Geological Survey, Earthquake Magnitude, Energy Release, and Shaking Intensity, Earthquake Hazards.

[2] World Health Organization, Earthquakes, Health topics.

[3] Sameer, Earthquake History (1965–2016): Data Visualization and Model Development (2019), Medium.

[4] K. Dilbaz, Do Earthquakes Follow A Pattern? (Part 2) (2019), Medium.

[5] DeVries, P.M.R., Viégas, F., Wattenberg, M. et al. Deep learning of aftershock patterns following large earthquakes (2018), Nature 560, 632–634.

[6] Mignan, A., Broccardo, M. One neuron versus deep learning in aftershock prediction (2019), Nature 574, E1–E3.

[7] R. Shah, Stand Up for Best Practices: (2019), Medium.

[8] Synced, Harvard & Google Seismic Paper Hit With Rebuttals: Is Deep Learning Suited to Aftershock Prediction? (2019), Medium.

[9] B. Rouet-Leduc, C. Hulbert, N. Lubbers, K. Barros, C. J. Humphreys, P. A. Johnson, Machine Learning Predicts Laboratory Earthquakes (2017), Geophysical Research Letters 44, 9276–9282.

[10] P. A. Johnson, B. Rouet-Leduc, L. J. Pyrak-Nolte, G. C. Beroza, et al., Laboratory earthquake forecasting: A machine learning competition (2021), Proceedings of the National Academy of Sciences, 118.

[11] C. Hulbert, B. Rouet-Leduc, P. A. Johnson, A Silent Build-up in Seismic Energy Precedes Slow Slip Failure in the Cascadia Subduction Zone (2019).

[12] M. G. Durante, E. M. Rathje, An exploration of the use of machine learning to predict lateral spreading (2021), SAGE Journal.

[13] DriveData, Richter’s Predictor: Modeling Earthquake Damage (2021).

[14] Mousavi, S.M., Ellsworth, W.L., Zhu, W. et al. Earthquake transformer — an attentive deep-learning model for simultaneous earthquake detection and phase picking (2020). Nat Commun 11, 3952.

[15] United States Geological Survey, Search Earthquake Catalog.

[16] United States Geological Survey, M 4.9 — Afghanistan.

[17] United States Geological Survey, M 5.3 — Tajikistan.

[18] Gustavo Bighellini Martins, Leveraging Open Banking through Data Science (2020), Medium.

[19] Catarina Freitas, Going beyond churn prediction to support customer retention (2021), Medium.