Classification in Astronomy: Galaxies vs Quasars

Carter Rhea
The Startup
Published in
4 min readNov 1, 2020

Machine Learning and Astronomy go together beautifully. Several astronomical problems involve solving large classification problems with a minimal amount of information. Today, we are going to explore an interesting problem: how to tell if an object is a galaxy or a quasar (an active galactic nucleus accreting an extreme amount of material). Although these two objects are clearly distinct, they often show up as unresolved point sources on telescopes! Let’s explore how we can distinguish between the two using colors! Colors are the difference between two photometric bands (usually in units of magnitude). We are going to be using colors from the Sloan Digital Sky Survey (https://www.sdss.org/). What are the bandpasses involved?

SDSS bandpasses (filters)

We are going to be using all five bandpasses noted above in the following color vector [u-g, g-r, r-i, i-z]. Finally, we will be using a wonderful set of data from AstroML (https://www.astroml.org/)

from matplotlib import pyplot as plt
from astroML.datasets import fetch_sdss_galaxy_colors
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, InputLayer, Flatten, Dropout
from keras.optimizers import SGD, Adam
from sklearn import preprocessing
from sklearn.metrics import confusion_matrix
import seaborn as sns
import random

Now that we have the imports, let go ahead and download the data and set up our color vector.

# Download data
data = fetch_sdss_galaxy_colors()
data = data[::10] # truncate for plotting

# Extract colors and spectral class: [u-g, g-r, r-i, i-z]
ug = data[‘u’] — data[‘g’]
gr = data[‘g’] — data[‘r’]
ri = data[‘r’] — data[‘i’]
iz = data[‘i’] — data[‘z’]
color_vec = np.array([ug, gr]).T#, ri, iz]).T
spec_class = data[‘specClass’]

The spec_class vector contains the classification labels. We have two options: Galaxy or QSO.

We can take look at the colors (or at least a simple 2D projection).

# Let’s take a quick look at some of the colors!
with plt.xkcd():
fig = plt.figure()
ax = fig.add_subplot(111)

ax.set_xlim(-0.5, 2.5)
ax.set_ylim(-0.5, 1.5)

ax.scatter(ug[galaxies], gr[galaxies], c=’b’, label=’galaxies’)
ax.scatter(ug[qsos], gr[qsos], c=’r’, label=’qsos’)

ax.legend(loc=2)

ax.set_xlabel(‘$u-g$’)
ax.set_ylabel(‘$g-r$’)

plt.show()

g-r VS u-g color plot

As we can see, galaxies and quasars naturally segregate themselves in color space :) This should make our classification algorithm have an easier job! Let’s see if it does!

We will first set our training and test sets (including labels). Then we need to transform our labels into numbers. I have written out each step here, but normally we should use sklearn. I repeat I’ve done this here solely for pedagogical reasons :)

train_per = int(0.8*len(color_vec))
#Let’s make our training and test sets
X_train = color_vec[:train_per]
Y_train = spec_class[:train_per]
X_test = color_vec[train_per:]
Y_test = spec_class[train_per:]

Y_train_enc = []
for val in Y_train:
if val == ‘GALAXY’:
Y_train_enc.append(0)
else:
Y_train_enc.append(1)
Y_test_enc = []
for val in Y_test:
if val == ‘GALAXY’:
Y_test_enc.append(0)
else:
Y_test_enc.append(1)

We are now going to build a basic neural network with two hidden layers containing 100 nodes each with a dropbox layer in between

activation = ‘relu’ # activation function
initializer = ‘normal’ # model initializer
lr = 0.001 # initial learning rate
loss_function = ‘sparse_categorical_crossentropy’
metrics_ = [‘accuracy’, ‘mae’, ‘mse’]

model = Sequential([
Dense(units=100, kernel_initializer=initializer, activation=activation),
Dropout(0.2),
Dense(units=100, kernel_initializer=initializer, activation=activation),
Dense(2, activation=’softmax’),
])

optimizer = Adam(learning_rate=0.001)
model.compile(optimizer=optimizer, loss=loss_function, metrics=metrics_)

Excellent! We can go ahead and fit our model :)

model.fit(X_train, np.array(Y_train_enc))
model.summary()

With our fitted model, we can immediately calculate our predictions

test_predictions = model.predict(X_test)

And let’s see how it looks :)

predictions = [np.argmax(pred) for pred in test_predictions]

matrix_conf = confusion_matrix(Y_test_enc,predictions)

# Normalize confusion matrix
matrix_conf = matrix_conf.astype(‘float64’)
norm_fac = np.sum(matrix_conf[:], axis=1)
for row in range(matrix_conf.shape[0]):
matrix_conf[row,:] = np.round(matrix_conf[row,:]*(1/norm_fac[row]),3)*100
# Plot confusion matrix
sns_plot = sns.heatmap(matrix_conf, annot=True, cmap=’GnBu’, xticklabels=[‘GALAXY’, ‘QSO’], yticklabels=[‘GALAXY’, ‘QSO’])
#sns_plot.set(xticklabels=np.arange(1,5))
#sns_plot.set(yticklabels=np.arange(1,5))
plt.ylabel(‘True’, fontsize=’x-large’)
plt.xlabel(‘Predicted’, fontsize=’x-large’)
sns_fig = sns_plot.get_figure()

Confusion Matrix

Not bad!!

We’ve seen how we can easily apply machine learning techniques to solve astronomical classification problems. The classification can be improved by playing with the network.

Hopefully, this has been instructional and will encourage you to find more machine learning and astronomy crossovers!

You can find the code here:
https://github.com/crhea93/Hacktober/blob/master/Day2/QuasarVsGalaxy.ipynb

--

--

Carter Rhea
The Startup

PhD Student in Astrophysics at the University of Montreal working on machine learning in astronomy. Co-founder of cadena.ca