Celestial Bodies Database: Data Preparation (Part 2)

Sri Hartini
8 min readSep 16, 2022

--

Photo by Javier Miranda on Unsplash

In this article, I’ll demonstrate how I prepare a celestial bodies database using PostgreSQL for this freeCodeCamp project. Please read Celestial Bodies Database: Data Preparation (Part 1) if you haven’t already.

Now we will continue to create the next table: the planet table.

Let’s start by looking at a list of the planets’ names that corresponds to the moon table from earlier. Keep in mind that every moon ought to have a foreign key that refers to one of the planet table’s rows. Therefore, we must ensure that the planet table contains this list of planets.

must_have_planets = df_moon.planet.unique() 
print(must_have_planets)

Now, let’s use web scraping to obtain the planet table from this website.

There are several columns, but I’ll just use the columns that particularly interest me.

Now that I feel comfortable with the table, let’s see if it contains every planet on the must_have_planets list. If not, we will have to include the planets' information that is missing from the table.

for planet_name in must_have_planets: 
if planet_name not in df_planet.name.values:
print(planet_name)
## Output:
# Salacia
# Varda

You can see from the code above that the data for the planets Salacia and Varda is not yet included in the table. So, we’ll include it.

We can update the planet column in the moon table with the planet_id now that we have completed all the planet names and assigned each one a planet's id.

Creating the first table: moon table (final)

Creating constellation and star tables

Since each planet should have a foreign key that refers to one of the rows in the star table, we might want to start thinking about how to connect the dots between the planet table and the star table.

The best solution I could come up with is to use a constellation table. The constellation that the planet is “in” could be located. As the planets orbit the sun in a plane, they are much closer to us than the background stars and only appear to move through certain constellations (source), hence the double quotation marks. And when I say background stars, I’m referring to the constellations, which are meaningful star patterns in the sky. The brightest star is also found in each constellation. Therefore, we could add a column to the planet table that indicates the brightest star in the constellation the planet is in.

[1] The Sky Live, Major Solar System Objects, https://theskylive.com/planets

[2] The Sky Live, Asteroids, https://theskylive.com/asteroids-and-dwarf-planets

[3] 50000 Quaoar. https://usuaris.tinet.cat/klunn/quaoar.html

[4] Wikipedia, 225088 Gonggong, https://en.wikipedia.org/wiki/225088_Gonggong

Earth, Orcus, and Varda are the three planets for whom I am unable to locate any information on their constellation. However, according to this article, the stars closest to Earth are found in the constellation Centaurus, so I’ll put Centaurus in the table as the constellation of Earth. Orcus is also known as Plutino, so I think it probably wouldn’t be wrong to designate Sagittarius as its constellation. In addition, Varda is a binary trans-Neptunian planetoid, thus I give it the same constellation as Neptune, which is Aquarius.

We can use web scraping again on this Wikipedia page to obtain the constellation table.

Out of practicality, I would like to alter the star column’s use of Greek letters on their records to our alphabet. And if you check further, the star’s name also contains [8]s, so I’ll omit them as well.

df_constellation.star.values
df_constellation.star.values

The star names of each planet can now be obtained using the constellation table, and we can then attempt to make the constellation table as simple as possible by omitting the constellation names that we do not use. I decided to approach this project in that manner. You can, however, choose to keep the names of each constellation.

keep_constellation = df_planet_constellation.constellation.unique() 
keep_constellation
df_constellation = df_constellation[df_constellation.name.isin(keep_constellation)]df_constellation = df_constellation.reset_index(drop=True) 
df_constellation

Now that we have the constellation table, which includes star names, we can create a star table from it. That would indicate that the stars we will use are Sadalsuud, Hamal, Arcturus, Deneb Algedi, Rigil Kentaurus, Diphda, Beta Comae Berenices, Regulus, Rasalhague, Enif, Kaus Australis, Aldebaran, and Spica.

If you remember “Little Astronomy,” there are two links that we can use to construct a star table. Actually, it is made up of many star lists. And while I have made an effort to gather data from the website, doing so would require more work than simply manually gathering the data on each of our desired stars. Therefore, I will create a separate article about the web scraping process used to gather star tables from that website. For the time being, I’ll just create a dictionary that we can later turn into a star table.

We need also to remember that each star should have a foreign key that references one of the rows in the galaxy table. However, Little Astronomy doesn’t provide information about the galaxy of each star, so I have to look it up on this website. Don’t forget to include the star_id column as well.

We can add a star_id column to the planet table now that we have (almost) completed the star table. Later, we will also replace the galaxy column in this star table with the galaxy_id.

Creating the planet table (final)

Creating the constellation table (final)

The same procedures can also be applied to the constellation table to replace the star column with each star’s id.

Creating the last table: galaxy table (final)

We will now continue to use the same web scraping method as before to create the following table, the galaxy table.

The table contains a total of 37 galaxies. Additionally, it contains a constellation column. Therefore, in order to keep things simple, I only want to keep a few of the names of the galaxies from the table. I will only use the galaxy’s name whose constellation is listed in the previous constellation table in order to accomplish that.

I will use the details of galaxies that are visible to the naked eye from this website to create one more column. If a galaxy can be seen with the naked eye, it will be set to True; otherwise, it will be set to False. I do this because at least two columns must use the BOOLEAN data type in this database. To meet the requirement, let’s add a column to this galaxy table since we already had one in the planet table.

It is now time to add the galaxy’s id and replace the constellation column with its id number.

Creating the star table (final)

You might also want to go back and check the star table to make sure that each star on our table belongs to the Milky Way galaxy. Its id number is 11, according to the galaxy table. Therefore, we can simply delete the previous galaxy column and add a new column with the id of the Milky Way galaxy.

df_star['galaxy_id'] = [11] * len(df_star) 
df_star = df_star.drop(columns='galaxy')
df_star

I believe the job is done. Five tables about the moon, planets, stars, galaxies, and constellations are now available. We have complied with all requirements. Below are images of every table we constructed.

moon table (there are actually 50 rows)
planet table
star table
galaxy table
constellation table

Naturally, we can begin writing SQL commands since the main objective of doing all of this is to compile the table so that we can use PostgreSQL to build a celestial bodies database. However, I won’t write an article about it and will instead keep it on my GitHub. Therefore, feel free to look it up on my GitHub if you need any ideas for how to go about doing it.

Thanks for reading!

I hope you’ll find this article interesting and useful. Consider clapping, following, and subscribing to show your support for my writing. I’ll come back with more projects so we can all learn. See you!

--

--