Classifying Celestial Objects using Neural Networks
The Sloan Digital Sky Survey has created the most detailed three-dimensional maps of the Universe ever made, with deep multi-color images of one-third of the sky, and spectra for more than three million astronomical objects. It allows to learn and explore all phases and surveys — past, present, and future — of the SDSS.
The dataset consists of 10,000 observations of space taken by the SDSS. Every observation is described by 17 feature columns and 1 class column which identifies it to be either a star, galaxy, or quasar. The dataset can be found here:https://www.kaggle.com/lucidlenn/sloan-digital-sky-survey
The dataset used in this project consists of 100000 rows and 18 columns, each column capturing some information about the celestial objects the rows represent.
Problem Statement-
The SDSS is a large collection of detailed 3D data about the universe, collecting information about the billions of celestial objects in the sky. What makes exploration difficult is the fact that the amount of data collected is enormous. The task can be made simpler by identifying the celestial objects represented by each observation. Hence, the task is to classify the observations into different classes (Star, Galaxy, Quasar).
Approach-
Seeing the dataset we can figure out that this is a case of multi-label classification. So, we will use an artificial neural network to perform this task and train it on the above dataset.
Data preprocessing-
1. The first step is to remove the unnecessary columns that do not contribute to the process of classification. After removing these columns we are left with the following columns.
2. Next, we will divide the dataset into the matrix of features and the independent variable.
3. Now, we can split the dataset into the training set and the test/validation set.
4. And finally, we will scale it to ease the computation done by the model.
Applying the model
Now we can design the neural net and apply it on the dataset and see it’s performance.
We will make a loop that will add hidden layers to a model after each iteration so that we can see the effect of adding hidden layers to our neural network.
The results are fascinating!!!!
Each of the models has an average accuracy of 97% on the validation data and the model seems to be performing well.
Also, we can see that the model is actually under fitted and we should increase the number of epochs to fit the model properly. This usually happens when several Dropout layers are added to the neural network.
Finally, let’s make a confusion matrix to verify the results.
We have successfully completed the task of classifying the observations into their respective classes (i.e Star, Galaxy, Quasar). This model can now be used to differentiate and prioritize different celestial objects in the night sky.