Using Poisson Distribution to Forecast the Next Earthquake

Naoko Suga
5 min readAug 7, 2018

--

Ever since I was little, I kept hearing or seeing statements like “the probability of the next big earthquake hitting Tokyo in X years is Y %.” It has always been seemingly very high number and the idea of a big earthquake hitting Tokyo has always horrified me. It was when I was in 4th grade — there was a rumor that a huge earthquake would hit Tokyo next day. I distinctly remember talking to my friends, promising that we’d survive through this tragedy that we are about to face, and safely meet on the next day. I was so afraid that I couldn’t fall asleep that night.

Japan is located on top of the 4 different plates.

Being situated at where four different tectonic plates meet (the North American, Pacific, Eurasian and Philippine plates), Japan is very prone to earthquakes. It seems to be the unfortunate fate that Japan has to face, and in order to conquer this quest, people have been trying to improve the accuracy of earthquake forecasting, as a precise earthquake forecasting could be a huge benefit to the people and society.

When you see news or read articles about earthquakes, you would often see the probability of earthquakes happening in certain span of time. It is one prominent way of forecasting earthquakes and it uses the statistical distribution called Poisson Distribution.

Poisson distribution is often used to describe the events that occur very rarely. It is a discrete probability distribution like binominal distribution, and it models the number of events occurring in a certain interval of time or space. The difference between the two is that the number of trials, n, is very large and probability, p, is very small for Poisson distribution.

Poisson Distribution

The probability mass function (PMF) for the Poisson distribution is:

where λ is the average number of events per an interval, and x is number of events in each interval.

Additionally, Poisson distribution follows the following two assumptions:

  1. Independency: events must be independent of each other (one event occurring shouldn’t be affecting the other events to occur)
  2. Homogeneity: the probability of occurrence of an event is constant

Since big earthquakes are rare event, and by assuming that earthquakes are independent and homogeneous events, we can forecast earthquakes using Poisson distribution. In fact, the forecast like the one from the beginning (“the probability of the next big earthquake hitting the city X in Y years is Z%”) is formulated in this way. So how do we forecast the earthquakes using Poisson distribution?

What is the probability that the next earthquake with the seismic intensity over 7.0 would hit Tokyo within 1 year?

In order to find the answer to this question, you first need to find the rate of the event λ. Since 1884, when Japan started recording the earthquake data, there have been five earthquakes with the seismic intensity over 7.0 in Kanto region (this is where Tokyo is located). The seismic intensity of 7.0 was chosen as the threshold because it is the strongest category that’s defined.

Following the second assumption, we can calculate the rate of the earthquake simply by dividing the number of events (=5) by the time span (1884 – 2018):

λ = 5/134 ≈0.037

In order to find the probability that the next big earthquake will hit Tokyo in 1 year, you first need to find out the probability that it doesn’t occur in a year (=p(0)) and subtract it from 1.

So as you can see, the probability of the earthquake occurring within a year is roughly 3.66%. This looks oddly low as

How about in 10 years?

It goes up to 31.14%. How about in 30 or 50 years?

It’s 67.35% and 84.52% respectively.

But… Are these probabilities trustworthy?

Although this way of forecasting has widely been used, it is quite questionable that these probabilities are trustworthy. In order to use the Poisson distribution, we assumed that the earthquake data follow this particular distribution and said that the two assumptions are satisfied, but does it actually satisfy the assumptions?

The first assumption states that events must be independent from each other. However, earthquakes, especially when it occurs around the same fault line, seem to be somewhat related to each other. The following is the list of the five earthquakes that have hit Kanto region in the past 134 years:

  1. 1894 (M 7.0)
  2. 1895 (M 7.2)
  3. 1921 (M 7.0)
  4. 1922 (M6.8)
  5. 1987 (M 6.7)

You can see how the second earthquake occurred a year after the first one, and the fourth after the third. This shows the possible corretions between each event.

Similarly, we assumed that the probability of occurrence of an event is constant. Based on the above, it doesn’t seem necessarily constant; hence it could be dangerous to assume the homogeneity of the events.

Then why is this method still used?

You might wonder why… but this is simply because there is not really a better way. We’ve accumulated over 100 year worth of data and that seems a lot. However, considering how long the earth has been around, it is way little data to conduct a statistical analysis.

--

--

Naoko Suga

Data Scientist and Machine Learning Engineer with a background in Physics research and financial analysis