Live Dashboard for Earthquake Prediction
A machine learning application
Summary
In my previous article [1], an initial Proof of Concept (PoC) on predicting earthquakes using machine learning was explored.
Here I discuss the implementation of that study, which was deployed here:
Core concepts recap
The previous study was based on a few core concepts, and here a summary is presented:
Problem statement
In order to have a more realistic scenario, we need to predict the location (latitude, longitude, and depth) and time of a quake, making this a time-series binary classification problem for every point in the analysed area.
Therefore the data was subdivided into a 3D grid (x-y-z).
Energy transformation
To enable the aggregation of different quakes in the same x-y-z-t grid point, and further manipulations (moving averages, ratios), an energy transformation was applied:
Aggregation cannot be done with magnitude: two events of magnitude = 3
are not the same as one of magnitude = 6
.
Data
The data was sourced from The United States Geological Survey (USGS) earthquake catalog [3]. The downloaded and cleaned data can be found here:
ML models and cross-validation
After the data transformation, an Xgboost model was applied. The hyperparameters search was conducted in the grid below:
###########################
# xgboost models
###########################xgb_params = {
'max_depth': [5, 6, 7],
'n_estimators': [15, 25, 35],
}
The top 5 grid search results are presented:
Updates
A few updates were made from the previous PoC, namely:
Magnitude target
The target was set to any earthquake with a magnitude equal to or bigger than 1
, instead of 5
. This will increase the dataset balance (because there are more events), making the model more stable, since this is one of the most important factors.
Granted that this has a lower application, assuming that the final objective is preemptive warning about events that can cause harm. But, not only that will help to model and understand this phenomenon, but it also makes sense from a physical perspective. Another model is being evaluated to predict the magnitude.
Please refer to the Next steps section for more information.
Time granularity
A new time granularity was selected: 3 days
. This is in between the two former granularities, daily
and weekly
, as it still provides a valuable warning horizon and does not dilute the dataset balance.
Metric
With a more stable model, the main metric was switched from F0.5 score
to F1 score
.
This will weight precision
and recall
evenly. Since this is still a preliminary study, later adjustments can be made.
Please refer to the Discussion section for more information.
Target regions
In addition to the Nazca-South American plate boundary area, two other areas were selected: San Andreas Fault and Japan, with boundaries shown in the images below:
The spatial resolution used was the same as the PoC (10° latitude, 12° longitude, and 100 km depth). Final balances are presented below:
* South AmericaBalance: 3.93%
Number of records: 4,580,864* US West coastBalance: 19.37%
Number of records: 483,138* JapanBalance: 7.04%
Number of records: 1,288,080
Discussion
The results achieved are satisfactory, but there is room for improvement.
There is a possibility to decrease some recall
for higher precision
, adjusting the threshold for prediction.
Space granularity needs to be better analysed to understand what is more useful for preemptive warning applications, without diluting (too much) the problem balance. For example, the US West coast
area has greater seismicity, thus allowing a spatial resolution with better resolution.
The prediction pipeline is deployed in Kaggle/GCP. It will be monitored to evaluate results and examine possible errors and improvements.
Next steps
With the current target being virtually all earthquakes, there is a need to further predict the magnitude.
Currently, two approaches are being reviewed:
- a regression model to predict the magnitude
- a binary classification with a target magnitude equal to or greater than
5
Both are being applied only in the predictions of the above study, and the results will be available shortly.
References, code, and data
[1] Gustavo Bighellini Martins, Predicting Earthquakes using Machine Learning (2021), Medium.
[2] United States Geological Survey, Earthquake Magnitude, Energy Release, and Shaking Intensity, Earthquake Hazards.
[3] United States Geological Survey, Search Earthquake Catalog.