Live Dashboard for Earthquake Prediction

Gustavo Martins
Predict
Published in
5 min readJul 28, 2022

A machine learning application

Live dashboard for earthquake prediction using Machine Learning. Made with Excalidraw. Image by author.

Summary

In my previous article [1], an initial Proof of Concept (PoC) on predicting earthquakes using machine learning was explored.

Here I discuss the implementation of that study, which was deployed here:

https://datastudio.google.com/s/pGV9c3LQqTQ

Google Data Studio Dashboard.

Core concepts recap

The previous study was based on a few core concepts, and here a summary is presented:

Problem statement

In order to have a more realistic scenario, we need to predict the location (latitude, longitude, and depth) and time of a quake, making this a time-series binary classification problem for every point in the analysed area.

Therefore the data was subdivided into a 3D grid (x-y-z).

3D grid representation. Made with Excalidraw. Image by author.

Energy transformation

To enable the aggregation of different quakes in the same x-y-z-t grid point, and further manipulations (moving averages, ratios), an energy transformation was applied:

Moment magnitude to energy equation. Source [2]. Image by author.

Aggregation cannot be done with magnitude: two events of magnitude = 3 are not the same as one of magnitude = 6.

Data

The data was sourced from The United States Geological Survey (USGS) earthquake catalog [3]. The downloaded and cleaned data can be found here:

ML models and cross-validation

After the data transformation, an Xgboost model was applied. The hyperparameters search was conducted in the grid below:

###########################
# xgboost models
###########################
xgb_params = {
'max_depth': [5, 6, 7],
'n_estimators': [15, 25, 35],
}

The top 5 grid search results are presented:

Top 5 cross-validation results. Mean values. Overfit from F1 score. Image by author.

Updates

A few updates were made from the previous PoC, namely:

Magnitude target

The target was set to any earthquake with a magnitude equal to or bigger than 1, instead of 5. This will increase the dataset balance (because there are more events), making the model more stable, since this is one of the most important factors.

Granted that this has a lower application, assuming that the final objective is preemptive warning about events that can cause harm. But, not only that will help to model and understand this phenomenon, but it also makes sense from a physical perspective. Another model is being evaluated to predict the magnitude.

Please refer to the Next steps section for more information.

Time granularity

A new time granularity was selected: 3 days. This is in between the two former granularities, daily and weekly, as it still provides a valuable warning horizon and does not dilute the dataset balance.

Metric

With a more stable model, the main metric was switched from F0.5 score to F1 score.

This will weight precision and recall evenly. Since this is still a preliminary study, later adjustments can be made.

Please refer to the Discussion section for more information.

Target regions

In addition to the Nazca-South American plate boundary area, two other areas were selected: San Andreas Fault and Japan, with boundaries shown in the images below:

The selected area for South America. Source [3].
The selected area for the US West coast. Source [3].
The selected area for Japan. Source [3].

The spatial resolution used was the same as the PoC (10° latitude, 12° longitude, and 100 km depth). Final balances are presented below:

* South AmericaBalance: 3.93%
Number of records: 4,580,864
* US West coastBalance: 19.37%
Number of records: 483,138
* JapanBalance: 7.04%
Number of records: 1,288,080

Discussion

Final results. Results are better than the cross-validation data because in the latter not all 90% of the training data is used for training, there is a fold for testing. Image by author.
Expanded confusion matrix. Magnitude rounded down. Image by author.

The results achieved are satisfactory, but there is room for improvement.

There is a possibility to decrease some recall for higher precision, adjusting the threshold for prediction.

Space granularity needs to be better analysed to understand what is more useful for preemptive warning applications, without diluting (too much) the problem balance. For example, the US West coast area has greater seismicity, thus allowing a spatial resolution with better resolution.

The prediction pipeline is deployed in Kaggle/GCP. It will be monitored to evaluate results and examine possible errors and improvements.

Next steps

With the current target being virtually all earthquakes, there is a need to further predict the magnitude.

Currently, two approaches are being reviewed:

  • a regression model to predict the magnitude
  • a binary classification with a target magnitude equal to or greater than 5

Both are being applied only in the predictions of the above study, and the results will be available shortly.

--

--