Bayesian Magnitude of Completeness (BMC) method

Exploring the fascinating world of incomplete seismicity data, Part I/II: Bayesian inference

If you’re reading this, I certainly teased you with a promising title. What is this fascinating world hiding in the incomplete data domain? If you’re working in Statistical Seismology, you may wonder. A few studies have investigated incomplete seismicity data, but usually, we just discard everything below the so-called completeness magnitude threshold. In this article, I will explain why incomplete data is important and how to use Machine Learning to model its complicated behaviour.

Our journey will start with discussing the common biases associated with incomplete data. Then we will go over Bayes’ Theorem to relate completeness thresholds to data sensors (Part I, this article). Finally, we will model incomplete data using Mixture Modelling (Part II, upcoming article). Although we’ll focus on seismicity, we will also bridge with other data domains, from the history of violence to the modelling of flower types. In short, we’ll explore a broad range of data and meet on the way Pierre-Simon, Marquis de Laplace, Reverend Thomas Bayes, and other interesting figures.

Earthquake completeness magnitude 101 | The completeness magnitude mc is the minimum magnitude bin above which the earthquake frequency-magnitude distribution follows the Gutenberg-Richter law with slope b [Gutenberg & Richter, 1944]. A number of statistical methods exist to estimate mc. Using the b-value profile as a function of minimum magnitude cut-off is one obvious approach to estimate b, so-called Goodness-of-Fit Technique [Wiemer & Wyss, 2000]. Animation created with R magick & ggplot2.

Virtually all seismicity analyses must start with an estimation of the completeness magnitude mc. Above this threshold is the domain of the Gutenberg-Richter law, log10(N(≥m)) = a-bm [Gutenberg & Richter, 1944], from which we can infer the probability that at least one event of magnitude m or greater will occur in a given interval of time. This is modelled by the Poisson distribution where P(k≥1, ≥m) = 1-exp[-N(≥m)], which comes from P(k≥1, ≥m) = 1-P(k=0, ≥m). The core of seismic risk assessment is built upon this empirical relationship [Cornell, 1968: eq. 6]. Note that this assumes event independence, which is a strong assumption (see my other article ‘Earthquakes on steroids, more powerful than the fat tail’). From the animation ‘earthquake completeness magnitude 101’, we see that getting mc right is paramount: too low and the b-value is biased, too high and we get a lot of variance. This explains why we do not use incomplete data below mc. The Gutenberg-Richter law is not valid there. All inferences based on this law collapse below mc.

Assuming that the data is complete when it is not (or simply underestimating the completeness threshold) will often lead to erroneous inferences. To illustrate this, let us talk about the history of violence. In ‘The Better Angels of our Nature’, Steven Pinker explains that violence has declined despite our impression of the contrary. He cites Matthew White who had already suggested that “maybe the only reason it appears that so many were killed in the past 200 years is because we have more records from that period.” Here, I reproduce Pinker’s figure 5–3 by using the list of deaths by wars given on Wikipedia, itself based on the work of White [necrometrics.com]. We clearly see that the apparent trend of increasing violence disappears when we use complete records. The same phenomenon is observed in earthquake catalogues.

A brief history of violence | Deaths in wars, normalised per annual worldwide population (fatality data from Wikipedia/necrometrics; population data from worldometers). The apparent increase in number of wars is only due to data incompleteness. We observe the same problem in earthquake catalogues. If we consider the full dataset (which includes incomplete data), the rate of earthquakes will appear to increase over time. This is not real and only reflects an improvement in the seismic network over time (confidence levels given for a random process).

So, using incomplete data is bad! But isn’t there any useful information below the completeness threshold? In an earthquake catalogue, about 90% of the reported seismicity is found below mc. Such a high ratio is not limited to earthquakes but is a common trait of natural phenomena described by a power-law. All those tiny events are gone. What’s going on below the mc threshold? Can we use incomplete data instead of throwing it away? With seismicity a scarce data resource when it comes to find robust signals in small spacetime windows, any data, however incomplete it may be, must contain some useful information.

Percentage of discarded data in empirical distributions described by a power-law, after Clauset et al. [2009] | Blackouts, cities, surnames, terrorism, flares, protein text, US wars (native American in blue vs U.S. American in green), and words (from Moby-Dick). Data obtained from Clauset’s website and from poweRlaw package. Note that xmin (vertical line) may represent a completeness threshold, a change in the underlying physical process, or simply that a power-law is not the best model choice. For seismicity, we have good evidence that the deviation from the power-law* is due to incompleteness. For all cases however, xmin, as its name suggests, is the minimum threshold above which data is kept for power-law statistics, and the percentage of lost data is huge (see given percentages). *The earthquake frequency-size distribution follows a power-law in the energy domain. The Gutenberg-Richter law, which is defined in the magnitude domain, is an exponential law. Unfortunately, you will often hear or read in the Seismological literature: ‘the Gutenberg-Richter power-law’ (disambiguation here in ‘The power-law tail’ section).

Access to smaller events increases the spatial resolution of crustal processes. Aftershock locations, for example, provide a proxy to the mainshock rupture geometry. This is illustrated below for the 2011 m = 6.2, New Zealand, Christchurch earthquake. A meta-analysis of 37 articles on earthquake short-term foreshocks showed also that the predictive power of these precursors seems to increase with decreasing minimum magnitude threshold [Mignan, 2014]. So, clearly, what happens below mc is important and should be included in seismicity pattern-recognition studies. This is obvious, so why do we still continue to throw away 90% of the available data? The rest of Part I will try to answer to this question in detail. In a few words, it is because incomplete data is considered unreliable and so cannot be modelled, therefore predicted, in a systematic fashion.

Spatial resolution of crustal phenomena, complete vs. full (complete+incomplete) data | Aftershocks are often used to delineate the mainshock rupture trace. Yellow dots here represent the aftershocks of the 2011 m = 6.2 Christchurch earthquake in New Zealand (data from Bannister et al. [2011]). The map on the left hand side displays the seismicity above the completeness magnitude 3.1 whereas the map on the right hand side displays all the seismicity, including the incomplete part below 3.1. I showed the first map to a colleague of mine who’s an expert in seismic hazard assessment. The red rectangle represents the mainshock rupture that he inferred from the complete data. I then showed him the second map including both complete and incomplete data; his updated prediction suggests more fault segments in the area. Map produced with ggmap package and Google Map API.

Completeness magnitude estimators

Before diving below the completeness threshold mc, let us estimate it, which is already a challenge in itself. I will only present four estimators; all apply to the frequency-magnitude distribution (hereafter FMD). For more model examples, I refer the reader to the review by Mignan & Woessner [2012]. Please beware that the R codes provided in the 2012 article are deprecated. Instead, we will use the 2018 rseismNet package, available on GitHub (see full tutorial on mc estimators there).

For illustration purposes, I generate N = 10,000 magnitudes m from two different distributions: FMD models can be defined as the product of the Gutenberg-Richter law (our seismotectonic process) and a detection function q(m) (our signal detection process which represents the probability of detecting m).

λ is the intensity rate and β = b log(10) (recall from ‘earthquake completeness magnitude 101’ that b is the slope of the Gutenberg-Richter law). We will consider the two following detection functions to simulate m, the cumulative Normal distribution:

[Ringdal, 1975; Ogata & Katsura, 1993] and a piecewise exponential function

[Mignan, 2012]. Multiplied to the Gutenberg-Richter law, those q functions yield a curved and angular FMD shapes, respectively, in log-linear scale (see next figure).

The following rseismNet functions will generate (m_1, …, m_N) events for us for the two proposed q functions (using Thinning in the first case, the Inversion Method in the second one — see source codes for details). We then calculate mc with four estimators, as follows:

# simulate magnitudes
n <- 1e4; mbin <- .1
m_curved <- rseismNet::bfmd.sim(n,
list(beta = log(10), mu = 2, sigma = 0.5))
m_angular <- rseismNet::efmd.sim(n,
list(kappa = 3 * log(10), beta = log(10), mc = 2))


# mc estimators
mc_curv_mode <- rseismNet::mc.val(m_curved, 'mode')
mc_curv_mbass <- rseismNet::mc.val(m_curved, 'mbass')
mc_curv_gft <- rseismNet::mc.val(m_curved, 'gft')
mc_ang_mode <- rseismNet::mc.val(m_angular, 'mode')
mc_ang_mbass <- rseismNet::mc.val(m_angular, 'mbass')
mc_ang_gft <- rseismNet::mc.val(m_angular, 'gft')
#mc_*_KS not yet implemened in rseismNet, similar to gft but with KS test

# frequency-magnitude distribution
fmd_curv <- rseismNet::fmd(m_curved)
fmd_ang <- rseismNet::fmd(m_angular)

# plotting
ggplot(...)
Different Frequency-Magnitude Distributions (FMDs) & mc estimators | Generated from the code block shown above.

What do we see? Four different mc estimators may give four different values. Here, the mode, as its name indicates, takes the mode of the distribution as mc value [Mignan, 2012]; MBASS (Median-Based Analysis of the Segment Slope) is another non-parametric estimator where the main change of slope in the FMD — estimated from a Wilcoxon rank-sum test — is defined as mc [Amorèse, 2007]; GFT (Goodness-of-Fit Technique) is a parametric estimator, which takes mc as the minimum magnitude above which the Gutenberg-Richter law fits the data within a fixed degree of confidence [Wiemer & Wyss, 2000] (see ‘earthquake completeness magnitude 101’). KS is conceptually similar to GFT but based on a Kolmogorov-Smirnov test [Clauset et al., 2009]. How to proceed? Taking the maximum value will give a more conservative estimate (although the true mc may still be higher). This is often enough to estimate the b-value. However, in our plan to better understand incomplete data, this is not acceptable; we need to find a better solution.

Let’s go back to our two different FMD shapes. There is a reason why I gave two possible q functions. The curved FMD is the most typical distribution observed in earthquake catalogues and this is the one anyone interested in incomplete seismicity data will model [Ringdal, 1975; Ogata & Katsura, 1993; Kijko & Smit, 2017; Martinsson & Jonsson, 2018]. However, if you dig into small space-time windows, you will likely start spotting angular FMDs once in a while [Mignan et al., 2011; Mignan, 2012; Mignan & Chen, 2016]. Here comes the fun part. It appears that the FMD shape depends on the spatial scale you choose (we won’t talk about the second-order temporal scale). And depending on the distribution shape, different estimators will give different mc estimates.

Spotting the angular frequency-magnitude distribution | Observed in regions where the completeness magnitude mc is about constant (i.e. may vary but below the selected binning). As the rate at which mc changes gets lower with distance from the seismic network (see prior definition below), the angular FMD gets observed in larger areas in outer regions. I’m glad to finally be able to show this plot, which didn’t make it to the final version of Mignan & Chen [2016] due to length restrictions.

Long story short: since a sum of angular FMDs of different mc can lead to a curved FMD, and no curved FMD* can lead to an angular FMD, any FMD must be defined as the sum of angular FMDs. This will be the premise of Part II on mixture modelling (* tuning the curved FMD parameters to minimise curvature yields a hard cut at mc, which is not what is observed locally). Let’s consider one of the nice properties about the angular FMD, or in proper statistical terms, the asymmetric Laplace distribution [Kotz et al., 2001], whose probability density function is

[Mignan, 2012]. The completeness threshold is by definition the mode of the distribution, which is easy to calculate. Four events is all you need to get a reasonable estimate of the mode while 100–200 events is a minimum for the other estimators [Mignan et al., 2011]. In theory, we could even go down to one event since the mode is the value that is most likely to be sampled. What’s the catch? The angular FMD is only valid in datasets of constant mc, while mc is heterogenous in space and time, which may explain why the angular FMD had not been observed before 2011 (so, no free lunch…).

But we want to stick to the angular FMD for the two good reasons given above. One option, then, would be to consider infinitesimal (x,y) cells to minimise the spatial variation of mc. Most cells would however be empty (we don’t consider depth z for sake of simplicity and assume that detection capabilities are stationary in time). The trade-off between spatial resolution and coverage is illustrated in the next image with a high-resolution mc map of Southern California (see rseismNet code below). We are still above event location uncertainty (0.1° binning here) but we already obtain a limited mc coverage for the region. We could optimise the binning but the trade-off would still remain. For any area, we can now define mc_area = max(mc_1, …, mc_cell, …) based on the mode, but what is the value of mc in empty cells? What if the next earthquakes occur in such cells? We’ve got ourselves a case of incomplete information and this is where Bayesian inference is coming to our rescue.

mc.grid <- rseismNet::mc.geogr(seism, 'mode', 'grid', dbin = 0.1, 
n.bootstrap = 100)
High-resolution mc map of Southern California | mc(lon, lat) = mode(M(lon, lat)) with M = (m1, m2, …) the magnitudes in each (lon, lat) geographical cell (here of width 0.1°). Note the limited spatial coverage despite the fact that only 4 events are requested for an mc estimation — Earthquake catalogue from Hauksson et al. [2012].
The “First and Second Laws of Error” of Laplace, by Wilson [1923] | It is Laplace who gave us the first two laws of error: His first law, published in 1774 [Laplace, 1774], stated that the frequency of an error could be expressed as an exponential function of the numerical magnitude of the error, disregarding sign. His second law was proposed four years later [Laplace, 1778] and stated that the frequency of the error is an exponential function of the square of the error. So, the Gaussian distribution is in fact the second Laplace distribution, a good example of Stigler’s law of eponymy (although ‘Gauss’ law could even predate Laplace, going back to de Moivre in 1738…). Wilson argued that far greater attention had been paid to the second law than to the first (which still applies today!) because “the second law involves the variable x² (with x the error), which is subject to all the laws of elementary mathematical analysis, whereas the first law, involving the absolute value of x is not an analytical function of its variable and therefore presents considerable mathematical difficulty in its manipulation”.

Bayesian inference

The goal of Bayesian inference is to infer a hidden quantity, in this case mc, based on incomplete information and prior beliefs. We will define our incomplete information as mc_obs, which we have calculated already. Now we need to define a prior model. We know that mc depends, at first order, on the spatial density of the seismic network. Let’s define the distance d to the kth nearest seismic station as a proxy to network density (k might represent the number of triggering stations needed to declare an event). Without going into the details of the model parameterisation, we can find that a power-law or a logarithmic function can fit the data relatively well. The purist might mention that the original prior should be independent of the observations. The functional shape f(d(k)) of our prior was obtained from a different dataset and it is assumed that this model universally applies. It is however calibrated to our specific region to consider regional attenuation conditions.

Prior model linking mc to seismic network density | I defined the prior as mc = c1.mc^c2+c3 (red curve), as in Mignan et al. [2011] (default values c1 and c2 used, c3 calibrated to the Southern California data). Using a logarithmic model would be less complex, but for legacy reasons, the power-law prior remains the basis of the Bayesian Magnitude of Completeness (BMC) method — Fun fact, a power-law is more likely to win the maximum likelihood game than a logarithmic function for the same number of free parameters. This has been the basis of a debate on which model works best in Psychophysics research for instance. Only by using the Fisher information matrix can we prove that the power-law should be penalised for its higher complexity [Pitt et al., 2002] (more on this in a future article). Back to our case, the density histogram and Q-Q plot indicate that the model error can be normally distributed.

Following the basic concept of conditional probability, Bayes’ Theorem [Bayes, 1763] states that

where A and B are events. In Bayesian inference, pioneered by again Laplace [1774], the probability measures a degree of belief in a proposition, or model, before and after accounting for some evidence, or data. To update a parameter, it follows that

where θ is the unknown parameter and X the observed data. We can reformulate it for our purpose as

where

f(d(k)) being our model and ε an error distribution. This is the basis of the Bayesian Magnitude of Completeness (BMC) method introduced by Mignan et al. [2011].

Excerpt from Laplace [1774:623] | “If an event can be produced by a number n of different causes, the probabilities of the existence of these causes taken from the event are between them as the probabilities of the event taken from these causes, and the probability of existence of each of them is equal to the probability of the event taken from this cause, divided by the sum of all probabilities of the event taken from each of these causes”. Bayes’ formula was presented posthumously by R. Price in 1763, but Laplace rediscovered it independently in 1774 and in a more general way. He was only 25 years old and was the first to apply Bayes’ theorem on experimental data [Boilley & Lallouet, 2016].

We verified above that the model errors are normally distributed, giving us

We assume that the likelihood is also normally distributed based on the Central Limit Theorem with

where the mean mc_obs and standard error sigma_obs are computed for n bootstrap samples of the magnitudes m. The marginal probability P(mc_obs) is considered to act as a normalising constant and can be neglected. We finally define mc_post as the mean of the posterior distribution, which is simply obtained from the product of the two Gaussian probability density distributions.

This represents the average of the predicted and observed completeness magnitudes, weighted according to their respective uncertainties. Similarly, the posterior standard deviation is given by

The different steps of the BMC algorithm are represented in the figure below. We now have a robust estimate of mc and of its standard deviation, for any location. The BMC method has been successfully applied in various regions of the world, including Taiwan [Mignan et al., 2011], Mainland China [Mignan et al., 2013], Switzerland [Kraft et al., 2013], Lesser Antilles arc [Vorobieva et al., 2013], California [Tormann et al., 2014], Greece [Mignan & Chouliaras, 2014], Iceland [Panzera et al., 2017], South Africa [Brandt, 2019] and Venezuela [Vásquez & Bravo de Guenni, n.d.]. Thanks to the rseismNet R package, anyone interested in applying BMC to a given seismic region can do so quite easily, just with an earthquake catalogue and the coordinates of the seismic network’s stations as input. The procedure is as simple as the next lines; you may also do everything in one line with the wrapper function bmc().

mc.opt <- rseismNet::mc.geogr(seism, 'mode', 'circle.opt', 
dbin = 0.1, stations = stations, n.bootstrap = 100)

prior <- rseismNet::bmc.prior(mc.opt, stations, kth = 5,
support = "calibrated")
mc.pred <- (prior[[1]]$c1 * prior[[2]]$d.kth ^ prior[[1]]$c2 +
prior[[1]]$c3)
sigma.pred <- rep(prior[[1]]$sigma, nrow(mc.grid))

bmc.grid <- rseismNet::bmc.bayes(mc.opt, mc.pred, sigma.pred)
Bayesian Magnitude of Completeness (BMC) results for Southern California | Note that BMC uses the optimised mc_obs map instead of the high-resolution one in order to maximise the number of observations while minimising mc heterogeneities (we’ll come back to that when discussing the scale-variant properties of the FMD in Part 2). As noted earlier, a power-law prior is used in BMC (legacy model) although a logarithmic function may be preferred. Also, Vásquez & Bravo de Guenni [n.d.] proposed a different likelihood formulation yielding a normal inverse-gamma posterior distribution. In short, the BMC method can still be improved.

To recap, by digging deep into local seismicity data, the angular FMD — or asymmetric Laplace distribution — was discovered. It has two nice properties which are the following: (1) mc can be defined as the mode of the magnitude sample, and (2) the sum of angular FMDs of different mc can lead to a curved FMD while a sum of curved FMDs cannot lead to an angular FMD (which will be the main topic of Part 2). Unfortunately, the angular FMD is difficult to retrieve, as it applies only to samples of constant mc. As we move to smaller regions to minimise mc heterogeneities, information about mc becomes progressively more incomplete (yes, we are talking about incomplete information about the magnitude of completeness — two different completeness issues!). We used the power of Bayes’ Theorem to combine incomplete observations to a prior model relating mc to seismic network density. Finally, we obtained robust mc estimates for any location, and for any region’s size, mc is simply the maximum value in the regional mc spatial distribution.

Part II: Modelling incomplete seismicity data with Mixture Modelling

As we saw, there is more to mc than meets the eye. We are now ready to take a deep dive below the mc surface, into the world of incomplete seismicity data. In the upcoming Part II, I will show that the magnitude distribution results from a scale-variant detection process, which means that the FMD shape changes when we zoom or un-zoom [Mignan & Chen, 2016]. I will present an FMD shape ontology [Mignan, 2012] and show how to model any conceivable FMD shape with an Asymmetric Laplace Mixture Model (ALMM) [Mignan, 2019]. Once again, we will discuss other data domains (such as the Iris flower dataset) and meet other important figures of Statistics and Machine Learning.

To be continued…

Main references:

Mignan, A., Werner, M.J., Wiemer, S., Chen, C.-C., Wu, Y.-M. (2011), Bayesian Estimation of the Spatially Varying Completeness Magnitude of Earthquake Catalogs, Bull. Seismol. Soc. Am., 101(3), 1371–1385, doi: 10.1785/0120100223

Mignan, A. (2012), Functional shape of the earthquake frequency-magnitude distribution and completeness magnitude, J. Geophys. Res., 117, B08302, doi: 10.1029/2012JB009347

Mignan, A., Chen, C.-C. (2016), The Spatial Scale of Detected Seismicity, Pure Appl. Geophys., 173, 117–124, doi: 10.1007/s00024–015–1133–7

Mignan, A. (2019), Asymmetric Laplace Mixture Modelling of Incomplete Power-Law Distributions: Application to ‘Seismicity Vision’, Computer Vision Conference 2019, Advances in Intelligent Systems and Computing, in press

Other references:

Amorèse, D. (2007), Applying a Change-Point Detection Method on Frequency-Magnitude Distributions, Bull. Seismol. Soc. Am., 97(5), 1742–1749, doi: 10.1785/0120060181

Bannister, S., Fry, B., Reyners, M., Ristau, J., Zhang, H. (2011), Fine-scale Relocation of Aftershocks of the 22 February Mw 6.2 Christchurch Earthquake Using Double-difference Tomography, Seismol. Res. Lett., 82(6), 839–845, doi: 10.1785/gssrl.82.6.839

Bayes, T. (1763), An essay towards solving a Problem in the Doctrine of Chances, By the late Rev. Mr. Bayes, F.R.S. communicated by Mr. Price, in a Letter to John Canton, A.M. F.R.S., Phil. Trans. Roy. Soc., 53, 370–418

Boilley, D., Lallouet, Y. (2016), 200 ans après Laplace, l’essor des méthodes bayésiennes d’analyse des données, Bulletin de l’Union des Physiciens, 110(981), 187–201

Brandt, M.B.C. (2019), Performance of the South African National Seismograph Network from October 2012 to February 2017: Spatially varying magnitude completeness, South African Journal of Geology, 122, xxx-xxx, doi: 10.2531/sajg.121.0035

Clauset, A., C.R. Shalizi and M.E.J. Newman (2009), Power-Law Distributions in Empirical Data, SIAM Review, 51(4), 661–703, doi: 10.1137/070710111

Cornell, C.A. (1968), Engineering Seismic Risk Analysis, Bull. Seismol. Soc. Am., 58(5), 1583–1606

Gutenberg, B., Richter, C.F. (1944), Frequency of earthquakes in California, Bull. Seismol. Soc. Am., 34, 184–188

Hauksson, E., Yang, W., Shearer, P.M. (2012), Waveform Relocated Earthquake Catalog of Southern California (1981 to June 2011), Bull. Seismol. Soc. Am., 102(5), 2239–2244, doi: 10.1785/0120120010

Kijko, A., Smit, A. (2017), Estimation of the Frequency-Magnitude Gutenberg-Richter b-Value without Making Assumptions on Levels of Completeness, Seismol. Res. Lett., 88(2A), 311–318, doi: 10.1785/0220160177

Kotz, S., Kozubowski, T.J., Podgórski, K. (2001), The Laplace Distribution and Generalizations, Birkhäuser, Boston, 349 pp.

Kraft, T., Mignan, A., Giardini, D. (2013), Optimization of a large-scale microseismic monitoring network in northern Switzerland, Geophys. J. Int., 195, 474–490, doi: 10.1093/gji/ggt225

Laplace, P.S. (1774), Mémoire sur la probabilité des causes par les événements, Mémoires de Mathématique et de Physique, Académie Royale des Sciences, 6, 621–656

Laplace, P.S. (1778), Mémoire sur les probabilités, Mémoires de l’Académie Royale des Sciences de Paris, 227–332

Martinsson, J., Jonsson, A. (2018), A New Model for the Distribution of Observable Earthquake Magnitudes and Applications to b-Value Estimation, IEEE Geoscience and Remote Sensing Letters, 15(6), 833–837, doi: 10.1109/LGRS.2018.2812770

Mignan, A., Woessner, J. (2012), Estimating the magnitude of completeness for earthquake catalogs, Community Online Resource for Statistical Seismicity Analysis, doi: 10.5078/corssa-00180805, available at http://www.corssa.org

Mignan, A., Jiang, C., Zechar, J.D., Wiemer, S., Wu, Z., Huang, Z. (2013), Completeness of the Mainland China Earthquake Catalog and Implications for the Setup of the China Earthquake Forecast Testing Center, Bull. Seismol. Soc. Am., 103(2A), 845–859, doi: 10.1785/0120120052

Mignan, A., Chouliaras, G. (2014), Fifty Years of Seismic Network Performance in Greece (1964–2013): Spatiotemporal Evolution of the Completeness Magnitude, Seismol. Res. Lett., 85(3), 657–667, doi: 10.1785/0220130209

Mignan, A. (2014), The debate on the prognostic value of earthquake foreshocks: A meta-analysis, Sci. Rep., 4, 4099, doi: 10.1038/srep04099

Ogata, Y., Katsura, K. (1993), Analysis of temporal and spatial heterogeneity of magnitude frequency distribution inferred from earthquake catalogues, Geophys. J. Int., 113, 727–738

Panzera, F., Mignan, A., Vogfjörð (2017), Spatiotemporal evolution of the completeness magnitude of the Icelandic earthquake catalogue from 1991 to 2013, J. Seismol., 21, 615–630, doi: 10.1007/s10950–016–9623–3

Pinker, S. (2011), The Better Angels of our Nature, Viking Books, 832 pp., ISBM: 978–0–670–02295–3

Pitt, M.A., Myung, I.J., Zhang, S. (2002), Toward a Method of Selecting Among Computational Models of Cognition, Psychological Review, 109(3), 472–491, doi: 10.1037/0033–295X.109.3.472

Ringdal, F. (1975), On the estimation of seismic detection thresholds, Bull. Seismol. Soc. Am., 65(6), 1631–1642

Tormann, T., Wiemer, S., Mignan, A. (2014), Systematic survey of high-resolution b value imaging along Californian faults: inference on asperities, J. Geophys. Res. Solid Earth, 119, 2029–2054, doi: 10.1002/2013JB010867

Vásquez, R., Bravo de Guenni, L. (n.d.), Bayesian Estimation of the Spatial Variation of the Completeness Magnitude for the Venezuelan Seismic Catalogue, available at https://www.statistics.gov.hk/wsc/CPS204-P47-S.pdf

Vorobieva, I., Narteau, C., Shebalin, P., Beauducel, F., Nercessian, A., Clouard, V., Bouin, M.-P. (2013), Multiscale Mapping of Completeness Magnitude of Earthquake Catalogs, Bull. Seismol. Soc. Am., 103(4), 2188–2202, doi: 10.1785/0120120132

Wiemer, S., Wyss, M. (2000), Minimum Magnitude of Completeness in Earthquake Catalogs: Examples from Alaska, the Western United States, and Japan, Bull. Seismol. Soc. Am., 90(4), 859–869

Wilson, E.B. (1923), First and Second Laws of Error, J. Am. Stat. Assoc., 18(143), 841–851

This article was originally published on LinkedIn on Mar. 13, 2019, under the title “The fascinating world of incomplete seismicity data​: Using Machine Learning for earthquake Data Mining (Part 1 of 2)”.