Messier 61 (52 Million light years away approx.) with SN2020jfo (Supernova) taken by amateur astronomer in 2020 (Wikipedia)

My First Data Science Project (1/5)

2 min readSep 16, 2020

This is the first of a series of posts to describe my very first data science project, developed during my PhD in Astronomy, back in 2015.

The question to answer was:

Provided the large amount of photometric supernove observations (AKA observation with low information content), is it possible to reproduce their classification based on spectroscopy (AKA observation with high information content) with high confidence?

Now I need to explain you few things, before diving into the description of the method I developed:
1. Supernove: this is the name given to exploded stars; in its lifetime a star goes through different stages of evolution (much like… anything in nature) and eventually, depending on its mass, it explodes.
2. Photometry and spectroscopy: these are two methods with which we measure lights coming from astronomical objects; photometry measure the amount of light we receive from the supernova (in this case), much like collecting water in a bucket. Spectroscopy is a more sophisticated method that allows us to know which chemical elements were in the supernova. Spectroscopy, then, gives much more information on a supernova with respect to photometry.
3. Spectroscopic classification: I won’t go into describing this, since it will be a too long detour; what we need to know here is that such classification is based on a method, spectroscopy, which provides a lot of information on the supernova.

The issue with spectroscopy is that it is very expensive and times consuming, while photometry is much cheaper and faster; thus astronomers have said something like this:
Let’s observe supernove with photometry, we can spot much more of them in this way with less money; then we will find a way to classify them without the need for a spectroscopic confirmation and get good science out of it.

The data set I used to answer the question was a simulated one, and was comprised of 20K photometric supernova observation in 4 photometric bands; it came from a paper by Kessler et al. (2010) “Supernova photometric classification challenge” .

In the future posts I will go into the description of the techniques I used in the project:
1. Gaussian Processes for regression
2. Diffusion Maps to reduce dimension of parameter space
3. Random Forest to build the classification model

Next post (Data set, approach and pre-processing)…

My First Data Science Project (1/5)

Written by Marco De Pascale