Classifying asteroids using ML: A beginner’s tale (Part 2)

Tarushi Pathak
Analytics Vidhya
Published in
4 min readJul 13, 2020

Hi ! This is the follow up article for classifying asteroids using ML Part 1.In the first article, I covered some very basic stuff like importing libraries, uploading the file, performing Label Encoding , resampling and dropping features for reasons other than their correlation with the target variable. If you do not know how to do all this , I suggest you visit the previous article here.

In this article I will be covering the following :

  • Correlation Heatmap
  • Normalizing Variables
  • Implementing the ML algorithm

Let’s get started then !

Correlation Heatmap

After dropping the columns (here reason was not their contribution to the target variable) , and doing other necessary analysis, we move ahead to correlation heatmap. The correlation heatmap tells us how strong the correlation of the variables is with each other with the help of the colour. The formula used for calculating the correlation is Pearson’s correlation coefficient. Values greater than zero have a positive correlation, which means both the variables change in the same direction. Negative correlation means that they would change in the opposite direction.

Correlation heatmap is generally used to drop the features which have no correlation with the target variable.A rookie mistake while interpreting the correlation heatmap is to drop the variables which show negative correlation. A negative correlation does not mean no relation at all. From the above correlation map you can see the that the correlations have been assigned colours. The more positive the correlation, the brighter the color it gets.

However, I ,sometimes, find it difficult to read , especially when there are so many variables in it . So I tend to plot the correlation in a different format which I find more intuitive.

Correlation with response variable

From the above pic, you can clearly see which feature shares a weaker correlation with the target variable. So , we will be dropping Miss Dist.(kilometers),Jupiter Tisserand Invariant,Epoch Osculation,Semi Major Axis,Inclination,Asc Node ,Longitude Perihelion, Arg Orbital ,Period Perihelion, Time ,Mean Anomaly and Mean Motion.

Now here’s the part where you see the beauty of Data Analytics and ML. You did not know anything about the 50 variables and whatever variable you dropped, was dropped simply because they showed no contribution to the target variable.If you had knowledge about the field of Astronomy and asteriods in particular, probably,dropping these variable would have been too obvious but, you do not and still you concluded the same. That’s the beauty of Data Analytics.

Normalizing Variables

If the values of feature variables are too large then they end up contributing more to the target variables even if they are not related. That’s why we must normalize them before implementing the algorithm on them.The library used to perform this is called Standard Scaler.

You simply separate the features and target variables , import Standard Scaler from sklearn.preprocessing and use it on feature variables.

Now our data is ready for prediction.

Implementing Machine Learning Model

This is actually one of the most easiest parts thanks to the sklearn library.We will be using logistic regression as it is one of the models which is highly used for binary classification. First we split the dataset into training and test set using train_test_split function.

After this , we simply implement logistic regression.

The fit method fits the model on the training set and the predict method helps you make predictions on the test data. The predicted values are then compared with the original values to return the accuracy. Classification Report is a sklearn metric which returns the F1 score, precision and recall , along with the accuracy. They tell you how good your classification model is.

Congratulations on successfully classifying asteroids!

Hope you learnt something ! Leave some comments and claps , if you’d like!

--

--