A little gaze into the earthquake dataset

Trần Hữu Nhật Huy
7 min readOct 19, 2023

--

This is part of first project of Udacity’s Data Science course.

I. Questions to ask

I live in Tokyo. Recently there are not much earthquakes around here, and to be honest I quite miss that thrilling experience of feeling such funny vibrant through my body. Coincidentally, I am taking Udacity Nanodegree — Data Scientists. Thus, I would like to have a dive into earthquake data, with these 3 questions:

  1. Which places are most vulnerable to earthquakes and seismic activities around the world, and in Japan?
  2. What are the nature characteristics of heavy earthquakes? Which factors determine a strong earthquake?
  3. Sometimes a seismic activity is caused by nuclear explosion. How can we detect a nuclear explosion based on seismic data solely?

II. Data Gather & Preparation

This dataset includes a record of the date, time, location, depth, magnitude, and source of every earthquake with a reported magnitude 5.5 or higher across the world, from 1965 to 2016.

The dataset is collected by The National Earthquake Information Center (NEIC), serving as a foundation for scientific research through the operation of modern digital national and global seismograph networks and cooperative international agreements. The dataset can be found on Kaggle dataset site.

For analysis, I pick 7 from the original 21 columns, they are Date, Time, Latitude, Longitude, Type, Depth, Magnitude. After some data wranglings, the fun of analysis begins.

III. Data Analysis

1. Data Overview

Fig. 1. Raw data numerical distribution overview.

As we can see from the histogram of those 4 numerical columns:

a. Latitude

Regarding latitude, it seems earthquake primarily occurs around the equator where latitude = 0. Based on the data, over half (55.72%) of earthquakes occurred within tropical zone.

b. Longitude

In contrast to latitude, looks like the majority of earthquakes happen at high longitudes. Based on the data, over one-third (34.12%) of earthquakes occurred at high altitudes.

Further investigation about locations should be done later in first question.

c. Depth

A majority of the earthquakes occurred at shallow depths. Based on the data, an overwhelming amounts of earthquakes (84.06%) occurred no deeper than 100 meters.

However, there is also a portion of earthquakes that occur around 600 meters depth as well.

d. Magnitude

Thankfully, based on the data, a majority of the earthquakes, almost 70%, are moderate ones — causing damage of varying severity to poorly constructed buildings. Almost 30% are strong ones, which can tear down buildings with no earthquake-resistant structures. Only around 3% of earthquakes are truly nightmarish, wreaking true havocs and claming massive amounts of lives.

2. First question: Earthquake geographical distribution

a. Globally

Fig. 2. Earthquake locations around the world (1965–2016)

We can clearly see that, a staggering majority of earthquakes occur along the Earth’s major tectonic plate boundaries, which is absolutely true for our consensus about earthquakes.

We can also see that a lot of earthquakes happen along the region of Indonesia, and East Australia. This is because these regions are on the boundaries of 3 tectonic plates: Indian-Australian, Earasian, and Pacific. The latitudes and longitudes of these regions also help us explain why there are a lot of earthquakes in tropical zones, and in high longitudes.

The image below shows real Earth’s tectonic plate boundaries, from which we can see striking resemblances of our map above.

Fig. 3. Map of tectonic plate boundaries.

b. Japan

Fig. 4. Earthquake category comparison between Japan and the world.

Here we present 4 categories based on Richter scale:

  • Moderate magnitude: less than 6.0 degree. These earthquakes can cause damage of varying severity to poorly constructed buildings, inflicting zero to slight damage to all other buildings, and felt by everyone.
  • Strong magnitude: from 6.0 to 7.0 degree. These earthquakes damage a moderate number of well-built structures in populated areas. Earthquake-resistant structures survive with slight to moderate damage.
  • Major magnitude: from 7.0 degree to 8.0 degree. These earthquakes causes damage to most buildings, some to partially or completely collapse or receive severe damage.
  • Great magnitude: over 8.0 degree. These earthquakes inflict huge damage to buildings, and structures likely to be destroyed, event seismic-resistant ones. These are legendary events and will be told by generations to come.

From the figure, we can see that Japan generally has stronger earthquakes with greater magnitudes compared to the rest of the world. This is why Japan is famous for both being constantly ravaged by earthquakes, and at the same time its seismic-proof architecture, a testament of its people’s diligence.

Fig. 5. Earthquake location distribution of Japan (1965–2016).

We can see that in general earthquakes in Japan occur along the Eastern coast, which is kinda self-explanatory since the tectonic plate boundary is there. A staggering amount of earthquakes are around the Tohoku sea, and also within the region, we can see 3 great earthquakes there as well, with the biggest one being the infamous 2011 Great Tohoku Earthquake and Tsunami.

So, moral of the story? Don’t settle near Tohoku coastlines lol.

(But not gonna lie, those places are gorgeous. I went there during the 2022 Tohoku Double Seventh Festival, one of the best trips of my life probably).

Fig. 6. The Aomori’s Sansa Odori festival, one of the signature annual festivals of Tohoku region.

3. Second question: Earthquake geographical distribution

After analysis of distribution and correlation of numerical features, it seems that there is little to no correlations between the numerical features of this earthquake dataset. The magnitude seems to be random.

However, we can also look at their locations. The figure below shows locations of earthquakes greater than 6.0 (strong category and above) around Earth.

Fig. 7. Locations of earthquakes whose magnitude greater than 6.0, around the world (1965–2016)

From the map above, we can see that, although the strong ones (6.0–7.0 magnitude) still scatter around the boundaries, the major ones tend to occur around:

  • The Western coast of America continent, where a lot of tectonic plates (North America, Caribbean, Cocos, Nazca, Juan de Fuca, Pacific, and South America plates) interact with each other.
  • The Arabian & Middle East region, along the intersections of Eurasian, African, Arabian and Indian plates.
  • The Indonesia region, intersection of two huge tectonic plates of Eurasian and Australian.
  • The Eastern Australia, intersection of another two huge tectonic plates of Australian and Pacific.
  • The Japan region, as we know it.

So, we can have a consensus that earthquake magnitude, although seems to be random, indeed correlates with locations, especially those with high tectonic intensity due to being intersections of multiple, or major tectonic plates.

As the knowledge of tectonic plate boundaries and their correlation with earthquakes are widely known, I decided that the modeling part is not needed in this project.

4. Last question: How to detect a nuclear explosion from its seismic activity?

Fig. 8. Nuclear explision locations around the world (1965–2016).
Fig. 9. Distribution of numerical features of nuclear explosion-based seismic events.

From the data above, we can have several insights:

a. Location and time

  • From the map above, we can see a majority of the nuclear tests came from the US and Soviet Union, and some of them came from territory of People’s Republic of China, way after 1995. Before 1995, the nuclear playground is primarily occupied by the two superpowers of US and Soviet Union.
  • Given that 175 recorded tests spanned from 1966 till 1996 — several years later after Soviet Union collapsed and the Cold War settled, we can clearly see that these nuke explosions are nuke tests. During the period of 1975–1985, there were an overwhelming amount of records on nuclear explosions, compared to other periods. Historically speaking, this is somewhat correlated to major geopolitics events in this period, such as the end of Vietnam War (1975), the Sino-Soviet split in Communist relations between China and Soviet Union, and so on.
  • The most important finding is that, the locations of these nuclear tests are nowhere on the tectonic plate boundaries, being separated completely from the vast majority of natural earthquake events (like, seriously do you wanna test a nuclear bomb at some unstable places full of earthquakes?). This is a bright feature to determine if a seismic activity is likely a nuke test, or just a natural earthquake.

b. Magnitude

The magnitudes of these nuke tests are typically not quite high, usually lower than 7.0 degree, which just equivalent to strong earthquakes.

c. Depth

The depths of these nuke tests are typically on the ground level, or just very shallow, no deeper than 30 meters, which is quite different from those of natural earthquakes.

IV. Summary

So, to answer those original 3 questions:

1. Earthquake distributions and characteristics, globally and in Japan

Globally:

  • Occurs along tectonic plate boundaries.
  • Shallow depths, primarily no deeper than 100 meters, but sometimes might be up to 600 meters.
  • Primarily major magnitude of less than 6.0 Richter degree, or sometimes up to 7.0, which is fortunate.

In Japan:

  • Occurs along the Tohoku sea.
  • Generally more intense than global average in terms of magnitude.
  • Greatest earthquakes in modern history occurred here.

2. Factors determining earthquake intensity

  • Generally, magnitude tends to be random.
  • Some specific tectonic plate boundaries have higher chance to experience higher magnitudes.

3. Nuclear explosions vs natural earthquakes

Comparing to natural earthquakes, nuclear explosions:

  • Occurred during the Cold War only, although now we still have it (look at the North Korea lol) but they tend to fly the ICBMs over Japan and explode them on upper layers of the atmosphere, so no seismic records here.
  • Did not occur at natural tectonic plate boundaries, but at designated nuke test sites.
  • Low magnitudes compared to natural earthquakes.
  • Shallow depths, usually on ground level.

--

--