Exploring the fascinating world of incomplete seismicity data, Part I/II: Bayesian inference
If you’re reading this, I certainly teased you with a promising title. What is this fascinating world hiding in the incomplete data domain? If you’re working in Statistical Seismology, you may wonder. A few studies have investigated incomplete seismicity data, but usually, we just discard everything below the so-called completeness magnitude threshold. In this article, I will explain why incomplete data is important and how to use Machine Learning to model its complicated behaviour.
Our journey will start with discussing the common biases associated with incomplete data. Then we will go over Bayes’ Theorem to relate completeness thresholds to data sensors (Part I, this article). Finally, we will model incomplete data using Mixture Modelling (Part II, upcoming article). Although we’ll focus on seismicity, we will also bridge with other data domains, from the history of violence to the modelling of flower types. In short, we’ll explore a broad range of data and meet on the way Pierre-Simon, Marquis de Laplace, Reverend Thomas Bayes, and other interesting figures.
Virtually all seismicity analyses must start with an estimation of the completeness magnitude mc. Above this threshold is the domain of the Gutenberg-Richter law, log10(N(≥m)) = a-bm [Gutenberg & Richter, 1944], from which we can infer the probability that at least one event of magnitude m or greater will occur in a given interval of time. This is modelled by the Poisson distribution where P(k≥1, ≥m) = 1-exp[-N(≥m)], which comes from P(k≥1, ≥m) = 1-P(k=0, ≥m). The core of seismic risk assessment is built upon this empirical relationship [Cornell, 1968: eq. 6]. Note that this assumes event independence, which is a strong assumption (see my other article ‘Earthquakes on steroids, more powerful than the fat tail’). From the animation ‘earthquake completeness magnitude 101’, we see that getting mc right is paramount: too low and the b-value is biased, too high and we get a lot of variance. This explains why we do not use incomplete data below mc. The Gutenberg-Richter law is not valid there. All inferences based on this law collapse below mc.
Assuming that the data is complete when it is not (or simply underestimating the completeness threshold) will often lead to erroneous inferences. To illustrate this, let us talk about the history of violence. In ‘The Better Angels of our Nature’, Steven Pinker explains that violence has declined despite our impression of the contrary. He cites Matthew White who had already suggested that “maybe the only reason it appears that so many were killed in the past 200 years is because we have more records from that period.” Here, I reproduce Pinker’s figure 5–3 by using the list of deaths by wars given on Wikipedia, itself based on the work of White [necrometrics.com]. We clearly see that the apparent trend of increasing violence disappears when we use complete records. The same phenomenon is observed in earthquake catalogues.
So, using incomplete data is bad! But isn’t there any useful information below the completeness threshold? In an earthquake catalogue, about 90% of the reported seismicity is found below mc. Such a high ratio is not limited to earthquakes but is a common trait of natural phenomena described by a power-law. All those tiny events are gone. What’s going on below the mc threshold? Can we use incomplete data instead of throwing it away? With seismicity a scarce data resource when it comes to find robust signals in small spacetime windows, any data, however incomplete it may be, must contain some useful information.
Access to smaller events increases the spatial resolution of crustal processes. Aftershock locations, for example, provide a proxy to the mainshock rupture geometry. This is illustrated below for the 2011 m = 6.2, New Zealand, Christchurch earthquake. A meta-analysis of 37 articles on earthquake short-term foreshocks showed also that the predictive power of these precursors seems to increase with decreasing minimum magnitude threshold [Mignan, 2014]. So, clearly, what happens below mc is important and should be included in seismicity pattern-recognition studies. This is obvious, so why do we still continue to throw away 90% of the available data? The rest of Part I will try to answer to this question in detail. In a few words, it is because incomplete data is considered unreliable and so cannot be modelled, therefore predicted, in a systematic fashion.
Completeness magnitude estimators
Before diving below the completeness threshold mc, let us estimate it, which is already a challenge in itself. I will only present four estimators; all apply to the frequency-magnitude distribution (hereafter FMD). For more model examples, I refer the reader to the review by Mignan & Woessner . Please beware that the R codes provided in the 2012 article are deprecated. Instead, we will use the 2018 rseismNet package, available on GitHub (see full tutorial on mc estimators there).
For illustration purposes, I generate N = 10,000 magnitudes m from two different distributions: FMD models can be defined as the product of the Gutenberg-Richter law (our seismotectonic process) and a detection function q(m) (our signal detection process which represents the probability of detecting m).
λ is the intensity rate and β = b log(10) (recall from ‘earthquake completeness magnitude 101’ that b is the slope of the Gutenberg-Richter law). We will consider the two following detection functions to simulate m, the cumulative Normal distribution:
[Ringdal, 1975; Ogata & Katsura, 1993] and a piecewise exponential function
[Mignan, 2012]. Multiplied to the Gutenberg-Richter law, those q functions yield a curved and angular FMD shapes, respectively, in log-linear scale (see next figure).
The following rseismNet functions will generate (m_1, …, m_N) events for us for the two proposed q functions (using Thinning in the first case, the Inversion Method in the second one — see source codes for details). We then calculate mc with four estimators, as follows:
# simulate magnitudes
n <- 1e4; mbin <- .1
m_curved <- rseismNet::bfmd.sim(n,
list(beta = log(10), mu = 2, sigma = 0.5))
m_angular <- rseismNet::efmd.sim(n,
list(kappa = 3 * log(10), beta = log(10), mc = 2))
# mc estimators
mc_curv_mode <- rseismNet::mc.val(m_curved, 'mode')
mc_curv_mbass <- rseismNet::mc.val(m_curved, 'mbass')
mc_curv_gft <- rseismNet::mc.val(m_curved, 'gft')
mc_ang_mode <- rseismNet::mc.val(m_angular, 'mode')
mc_ang_mbass <- rseismNet::mc.val(m_angular, 'mbass')
mc_ang_gft <- rseismNet::mc.val(m_angular, 'gft')
#mc_*_KS not yet implemened in rseismNet, similar to gft but with KS test
# frequency-magnitude distribution
fmd_curv <- rseismNet::fmd(m_curved)
fmd_ang <- rseismNet::fmd(m_angular)
What do we see? Four different mc estimators may give four different values. Here, the mode, as its name indicates, takes the mode of the distribution as mc value [Mignan, 2012]; MBASS (Median-Based Analysis of the Segment Slope) is another non-parametric estimator where the main change of slope in the FMD — estimated from a Wilcoxon rank-sum test — is defined as mc [Amorèse, 2007]; GFT (Goodness-of-Fit Technique) is a parametric estimator, which takes mc as the minimum magnitude above which the Gutenberg-Richter law fits the data within a fixed degree of confidence [Wiemer & Wyss, 2000] (see ‘earthquake completeness magnitude 101’). KS is conceptually similar to GFT but based on a Kolmogorov-Smirnov test [Clauset et al., 2009]. How to proceed? Taking the maximum value will give a more conservative estimate (although the true mc may still be higher). This is often enough to estimate the b-value. However, in our plan to better understand incomplete data, this is not acceptable; we need to find a better solution.
Let’s go back to our two different FMD shapes. There is a reason why I gave two possible q functions. The curved FMD is the most typical distribution observed in earthquake catalogues and this is the one anyone interested in incomplete seismicity data will model [Ringdal, 1975; Ogata & Katsura, 1993; Kijko & Smit, 2017; Martinsson & Jonsson, 2018]. However, if you dig into small space-time windows, you will likely start spotting angular FMDs once in a while [Mignan et al., 2011; Mignan, 2012; Mignan & Chen, 2016]. Here comes the fun part. It appears that the FMD shape depends on the spatial scale you choose (we won’t talk about the second-order temporal scale). And depending on the distribution shape, different estimators will give different mc estimates.
Long story short: since a sum of angular FMDs of different mc can lead to a curved FMD, and no curved FMD* can lead to an angular FMD, any FMD must be defined as the sum of angular FMDs. This will be the premise of Part II on mixture modelling (* tuning the curved FMD parameters to minimise curvature yields a hard cut at mc, which is not what is observed locally). Let’s consider one of the nice properties about the angular FMD, or in proper statistical terms, the asymmetric Laplace distribution [Kotz et al., 2001], whose probability density function is
[Mignan, 2012]. The completeness threshold is by definition the mode of the distribution, which is easy to calculate. Four events is all you need to get a reasonable estimate of the mode while 100–200 events is a minimum for the other estimators [Mignan et al., 2011]. In theory, we could even go down to one event since the mode is the value that is most likely to be sampled. What’s the catch? The angular FMD is only valid in datasets of constant mc, while mc is heterogenous in space and time, which may explain why the angular FMD had not been observed before 2011 (so, no free lunch…).
But we want to stick to the angular FMD for the two good reasons given above. One option, then, would be to consider infinitesimal (x,y) cells to minimise the spatial variation of mc. Most cells would however be empty (we don’t consider depth z for sake of simplicity and assume that detection capabilities are stationary in time). The trade-off between spatial resolution and coverage is illustrated in the next image with a high-resolution mc map of Southern California (see rseismNet code below). We are still above event location uncertainty (0.1° binning here) but we already obtain a limited mc coverage for the region. We could optimise the binning but the trade-off would still remain. For any area, we can now define mc_area = max(mc_1, …, mc_cell, …) based on the mode, but what is the value of mc in empty cells? What if the next earthquakes occur in such cells? We’ve got ourselves a case of incomplete information and this is where Bayesian inference is coming to our rescue.
mc.grid <- rseismNet::mc.geogr(seism, 'mode', 'grid', dbin = 0.1,
n.bootstrap = 100)
The goal of Bayesian inference is to infer a hidden quantity, in this case mc, based on incomplete information and prior beliefs. We will define our incomplete information as mc_obs, which we have calculated already. Now we need to define a prior model. We know that mc depends, at first order, on the spatial density of the seismic network. Let’s define the distance d to the kth nearest seismic station as a proxy to network density (k might represent the number of triggering stations needed to declare an event). Without going into the details of the model parameterisation, we can find that a power-law or a logarithmic function can fit the data relatively well. The purist might mention that the original prior should be independent of the observations. The functional shape f(d(k)) of our prior was obtained from a different dataset and it is assumed that this model universally applies. It is however calibrated to our specific region to consider regional attenuation conditions.
Following the basic concept of conditional probability, Bayes’ Theorem [Bayes, 1763] states that
where A and B are events. In Bayesian inference, pioneered by again Laplace , the probability measures a degree of belief in a proposition, or model, before and after accounting for some evidence, or data. To update a parameter, it follows that
where θ is the unknown parameter and X the observed data. We can reformulate it for our purpose as
f(d(k)) being our model and ε an error distribution. This is the basis of the Bayesian Magnitude of Completeness (BMC) method introduced by Mignan et al. .
We verified above that the model errors are normally distributed, giving us
We assume that the likelihood is also normally distributed based on the Central Limit Theorem with
where the mean mc_obs and standard error sigma_obs are computed for n bootstrap samples of the magnitudes m. The marginal probability P(mc_obs) is considered to act as a normalising constant and can be neglected. We finally define mc_post as the mean of the posterior distribution, which is simply obtained from the product of the two Gaussian probability density distributions.
This represents the average of the predicted and observed completeness magnitudes, weighted according to their respective uncertainties. Similarly, the posterior standard deviation is given by
The different steps of the BMC algorithm are represented in the figure below. We now have a robust estimate of mc and of its standard deviation, for any location. The BMC method has been successfully applied in various regions of the world, including Taiwan [Mignan et al., 2011], Mainland China [Mignan et al., 2013], Switzerland [Kraft et al., 2013], Lesser Antilles arc [Vorobieva et al., 2013], California [Tormann et al., 2014], Greece [Mignan & Chouliaras, 2014], Iceland [Panzera et al., 2017], South Africa [Brandt, 2019] and Venezuela [Vásquez & Bravo de Guenni, n.d.]. Thanks to the rseismNet R package, anyone interested in applying BMC to a given seismic region can do so quite easily, just with an earthquake catalogue and the coordinates of the seismic network’s stations as input. The procedure is as simple as the next lines; you may also do everything in one line with the wrapper function bmc().
mc.opt <- rseismNet::mc.geogr(seism, 'mode', 'circle.opt',
dbin = 0.1, stations = stations, n.bootstrap = 100)
prior <- rseismNet::bmc.prior(mc.opt, stations, kth = 5,
support = "calibrated")
mc.pred <- (prior[]$c1 * prior[]$d.kth ^ prior[]$c2 +
sigma.pred <- rep(prior[]$sigma, nrow(mc.grid))
bmc.grid <- rseismNet::bmc.bayes(mc.opt, mc.pred, sigma.pred)
To recap, by digging deep into local seismicity data, the angular FMD — or asymmetric Laplace distribution — was discovered. It has two nice properties which are the following: (1) mc can be defined as the mode of the magnitude sample, and (2) the sum of angular FMDs of different mc can lead to a curved FMD while a sum of curved FMDs cannot lead to an angular FMD (which will be the main topic of Part 2). Unfortunately, the angular FMD is difficult to retrieve, as it applies only to samples of constant mc. As we move to smaller regions to minimise mc heterogeneities, information about mc becomes progressively more incomplete (yes, we are talking about incomplete information about the magnitude of completeness — two different completeness issues!). We used the power of Bayes’ Theorem to combine incomplete observations to a prior model relating mc to seismic network density. Finally, we obtained robust mc estimates for any location, and for any region’s size, mc is simply the maximum value in the regional mc spatial distribution.
Part II: Modelling incomplete seismicity data with Mixture Modelling
As we saw, there is more to mc than meets the eye. We are now ready to take a deep dive below the mc surface, into the world of incomplete seismicity data. In the upcoming Part II, I will show that the magnitude distribution results from a scale-variant detection process, which means that the FMD shape changes when we zoom or un-zoom [Mignan & Chen, 2016]. I will present an FMD shape ontology [Mignan, 2012] and show how to model any conceivable FMD shape with an Asymmetric Laplace Mixture Model (ALMM) [Mignan, 2019]. Once again, we will discuss other data domains (such as the Iris flower dataset) and meet other important figures of Statistics and Machine Learning.
To be continued…
Mignan, A., Werner, M.J., Wiemer, S., Chen, C.-C., Wu, Y.-M. (2011), Bayesian Estimation of the Spatially Varying Completeness Magnitude of Earthquake Catalogs, Bull. Seismol. Soc. Am., 101(3), 1371–1385, doi: 10.1785/0120100223
Mignan, A. (2012), Functional shape of the earthquake frequency-magnitude distribution and completeness magnitude, J. Geophys. Res., 117, B08302, doi: 10.1029/2012JB009347
Mignan, A., Chen, C.-C. (2016), The Spatial Scale of Detected Seismicity, Pure Appl. Geophys., 173, 117–124, doi: 10.1007/s00024–015–1133–7
Mignan, A. (2019), Asymmetric Laplace Mixture Modelling of Incomplete Power-Law Distributions: Application to ‘Seismicity Vision’, Computer Vision Conference 2019, Advances in Intelligent Systems and Computing, in press
Amorèse, D. (2007), Applying a Change-Point Detection Method on Frequency-Magnitude Distributions, Bull. Seismol. Soc. Am., 97(5), 1742–1749, doi: 10.1785/0120060181
Bannister, S., Fry, B., Reyners, M., Ristau, J., Zhang, H. (2011), Fine-scale Relocation of Aftershocks of the 22 February Mw 6.2 Christchurch Earthquake Using Double-difference Tomography, Seismol. Res. Lett., 82(6), 839–845, doi: 10.1785/gssrl.82.6.839
Bayes, T. (1763), An essay towards solving a Problem in the Doctrine of Chances, By the late Rev. Mr. Bayes, F.R.S. communicated by Mr. Price, in a Letter to John Canton, A.M. F.R.S., Phil. Trans. Roy. Soc., 53, 370–418
Boilley, D., Lallouet, Y. (2016), 200 ans après Laplace, l’essor des méthodes bayésiennes d’analyse des données, Bulletin de l’Union des Physiciens, 110(981), 187–201
Brandt, M.B.C. (2019), Performance of the South African National Seismograph Network from October 2012 to February 2017: Spatially varying magnitude completeness, South African Journal of Geology, 122, xxx-xxx, doi: 10.2531/sajg.121.0035
Clauset, A., C.R. Shalizi and M.E.J. Newman (2009), Power-Law Distributions in Empirical Data, SIAM Review, 51(4), 661–703, doi: 10.1137/070710111
Cornell, C.A. (1968), Engineering Seismic Risk Analysis, Bull. Seismol. Soc. Am., 58(5), 1583–1606
Gutenberg, B., Richter, C.F. (1944), Frequency of earthquakes in California, Bull. Seismol. Soc. Am., 34, 184–188
Hauksson, E., Yang, W., Shearer, P.M. (2012), Waveform Relocated Earthquake Catalog of Southern California (1981 to June 2011), Bull. Seismol. Soc. Am., 102(5), 2239–2244, doi: 10.1785/0120120010
Kijko, A., Smit, A. (2017), Estimation of the Frequency-Magnitude Gutenberg-Richter b-Value without Making Assumptions on Levels of Completeness, Seismol. Res. Lett., 88(2A), 311–318, doi: 10.1785/0220160177
Kotz, S., Kozubowski, T.J., Podgórski, K. (2001), The Laplace Distribution and Generalizations, Birkhäuser, Boston, 349 pp.
Kraft, T., Mignan, A., Giardini, D. (2013), Optimization of a large-scale microseismic monitoring network in northern Switzerland, Geophys. J. Int., 195, 474–490, doi: 10.1093/gji/ggt225
Laplace, P.S. (1774), Mémoire sur la probabilité des causes par les événements, Mémoires de Mathématique et de Physique, Académie Royale des Sciences, 6, 621–656
Laplace, P.S. (1778), Mémoire sur les probabilités, Mémoires de l’Académie Royale des Sciences de Paris, 227–332
Martinsson, J., Jonsson, A. (2018), A New Model for the Distribution of Observable Earthquake Magnitudes and Applications to b-Value Estimation, IEEE Geoscience and Remote Sensing Letters, 15(6), 833–837, doi: 10.1109/LGRS.2018.2812770
Mignan, A., Woessner, J. (2012), Estimating the magnitude of completeness for earthquake catalogs, Community Online Resource for Statistical Seismicity Analysis, doi: 10.5078/corssa-00180805, available at http://www.corssa.org
Mignan, A., Jiang, C., Zechar, J.D., Wiemer, S., Wu, Z., Huang, Z. (2013), Completeness of the Mainland China Earthquake Catalog and Implications for the Setup of the China Earthquake Forecast Testing Center, Bull. Seismol. Soc. Am., 103(2A), 845–859, doi: 10.1785/0120120052
Mignan, A., Chouliaras, G. (2014), Fifty Years of Seismic Network Performance in Greece (1964–2013): Spatiotemporal Evolution of the Completeness Magnitude, Seismol. Res. Lett., 85(3), 657–667, doi: 10.1785/0220130209
Mignan, A. (2014), The debate on the prognostic value of earthquake foreshocks: A meta-analysis, Sci. Rep., 4, 4099, doi: 10.1038/srep04099
Ogata, Y., Katsura, K. (1993), Analysis of temporal and spatial heterogeneity of magnitude frequency distribution inferred from earthquake catalogues, Geophys. J. Int., 113, 727–738
Panzera, F., Mignan, A., Vogfjörð (2017), Spatiotemporal evolution of the completeness magnitude of the Icelandic earthquake catalogue from 1991 to 2013, J. Seismol., 21, 615–630, doi: 10.1007/s10950–016–9623–3
Pinker, S. (2011), The Better Angels of our Nature, Viking Books, 832 pp., ISBM: 978–0–670–02295–3
Pitt, M.A., Myung, I.J., Zhang, S. (2002), Toward a Method of Selecting Among Computational Models of Cognition, Psychological Review, 109(3), 472–491, doi: 10.1037/0033–295X.109.3.472
Ringdal, F. (1975), On the estimation of seismic detection thresholds, Bull. Seismol. Soc. Am., 65(6), 1631–1642
Tormann, T., Wiemer, S., Mignan, A. (2014), Systematic survey of high-resolution b value imaging along Californian faults: inference on asperities, J. Geophys. Res. Solid Earth, 119, 2029–2054, doi: 10.1002/2013JB010867
Vásquez, R., Bravo de Guenni, L. (n.d.), Bayesian Estimation of the Spatial Variation of the Completeness Magnitude for the Venezuelan Seismic Catalogue, available at https://www.statistics.gov.hk/wsc/CPS204-P47-S.pdf
Vorobieva, I., Narteau, C., Shebalin, P., Beauducel, F., Nercessian, A., Clouard, V., Bouin, M.-P. (2013), Multiscale Mapping of Completeness Magnitude of Earthquake Catalogs, Bull. Seismol. Soc. Am., 103(4), 2188–2202, doi: 10.1785/0120120132
Wiemer, S., Wyss, M. (2000), Minimum Magnitude of Completeness in Earthquake Catalogs: Examples from Alaska, the Western United States, and Japan, Bull. Seismol. Soc. Am., 90(4), 859–869
Wilson, E.B. (1923), First and Second Laws of Error, J. Am. Stat. Assoc., 18(143), 841–851
This article was originally published on LinkedIn on Mar. 13, 2019, under the title “The fascinating world of incomplete seismicity data: Using Machine Learning for earthquake Data Mining (Part 1 of 2)”.