GSoC 2021 with ML4Sci | Equivariant Neural Networks for Classification of Dark Matter Substructure

Apoorva Vikram Singh
8 min readAug 25, 2021

--

This blog briefly summarises my Google Summer of Code (GSoC) 2021 project under Machine Learning for Science (ML4Sci). GSoC 2021, 17th edition of Google Summer of Code, presented 1,292 students with the exciting opportunity to work in close association with 199 renowned open-source organizations for 10 weeks.

Image Credits: Google Summer of Code

The DeepLense Project

Project Description

The study of substructures in the dark matter has shown signs of promise to deliver on the open-ended problem of the identity of dark matter. Strong gravitational lensing has been proved to be a strong probe for dark matter substructure. DeepLense is a deep learning pipeline for particle dark matter searches with strong gravitational lensing and falls under the umbrella organization ML4SCI. This year ML4SCI introduced three projects as a part of the DeepLense project in GSoC 2021. I had the opportunity to work on the project: Equivariant Neural Networks for Dark Matter Morphology with Strong Gravitational Lensing. The project builds on the work Deep Learning the Morphology of Dark Matter Substructure where my mentors have tried to analyze substructures present in dark matter using deep learning methods (ResNet model).

Image Credits: Freddie Pagani via Physics Today

Strong Gravitational Lensing

It was established by Einstein’s general theory of relativity that mass concentrations distort the space around them. The phenomenon of gravitational lensing occurs when a massive amount of matter (say a cluster of galaxies) generates a gravitational field that distorts and magnifies the light from distant galaxies that are behind it but in the same line of sight. The effect is like looking through a giant magnifying glass. It allows researchers to study the details of early galaxies too far away to be seen with current technology and telescopes. Please read this article for a clearer understanding of gravitational lensing.

Correlation of Strong Gravitational Lensing with Dark Matter: Dark matter is an invisible form of matter that contributes to most of the mass of the universe and is responsible for the underlying structure of the universe. It being invisible makes it particularly hard for the researchers to identify and study its nature. However, researchers can detect its influence by observing how the gravity of massive galaxy clusters, which contain dark matter, bends and distorts the light of more distant galaxies located behind the cluster. This phenomenon, strong gravitational lensing, promises to deliver some exciting results in terms of identifying dark matter and its research.

Strong Gravitational Lensing provides us with an effective way to study the substructure of dark matter, allowing us to better comprehend its underlying nature. The idea to formulate the study of substructures of dark matter as a classification problem has proven to be fruitful in past works that employ machine learning models. However, there is a major drawback in using vanilla convolutional networks for this application. Current convolutional neural networks are only capable of translational equivariance. This implies that they fail to exploit a large group of symmetries (including rotation and reflection) that is present in the data. Equivariant neural networks propose a solution by guaranteeing a specified transformation behaviour of their feature space under transformations of the input. The central goal of this GSoC 2021 project is to develop an equivariant neural network to study the substructures in strong gravitational lensing images.

Repository

I have organized all my contributions in a GitHub repository. My Github repository GSOC Equivariant Network includes all the details regarding creating the virtual environment, running the code and the results obtained. It also includes the link to download the weights of pre-trained models from my experiment.

About Me

I am Apoorva Vikram Singh, a Bachelor of Technology (B. Tech.) student in Electrical Engineering at the National Institute of Technology Silchar — Assam, India.

Why DeepLense?

I have been deeply interested in the equivariant neural networks and their applications ever since I read the paper Group Equivariant Convolutional Networks. I wanted to work with equivariant networks and make them more effective and efficient. DeepLense seemed like a perfect opportunity for me to understand the dynamics of equivariant neural networks and how they can help in making a major contribution in an important domain like astrophysics.

Dataset

Simulated sample images were obtained from strong gravitational lensing for Model F. No substructure (left), Spherical substructure (middle), and Vortex substructure (right).
Simulated sample images were obtained from strong gravitational lensing for Model J. No substructure (left), Spherical substructure (middle), and Vortex substructure (right).

As discussed earlier, our problem is a multiclass classification problem and it contains three classes: strong lensing images with no substructure, spherical substructure, and vortex substructure. The python package PyAutoLens has been used for the simulations of the data.

We have worked with two different datasets for this project, namely Model F and Model J. Both the datasets have the same three classes:

  • No Substructure: Gravitational Lensing Images simulated with no substructure
  • Spherical Substructure: Gravitational Lensing Images simulated with Cold dark matter
  • Vortex Substructure: Gravitational Lensing Images simulated with the superfluid dark matter

Both datasets contain 75000 training images (25000 belonging to each class) and 7500 testing images (2500 belonging to each class). The major difference between the two simulated datasets boils down to the value of redshift used for the simulation which is fixed for Model F and is allowed to float over a range of values in the case of Model J. Also, the SNR value for Model F is fixed to around 20 while images produced for Model J vary from 10 to 30 SNR. In simple terms, the images belonging to Model J can be considered harder to train as compared to Model F.

Equivariant Neural Network

A network is translation equivariant means shifting the image and then feeding it through a number of layers is the same as feeding the original image through the same layers and then shifting the resulting feature maps. However, larger groups of symmetries, including rotations and reflections are present in the data that are omitted. Current convolutional neural networks are only capable of translational equivariance. However, in a number of applications (including ours), larger groups of symmetries, including rotations and reflections are present in the data as well that needs to be exploited. This gives rise to the notion of Equivariant Convolutional Networks.

Image Credits: UCL Visual Computing. The image on the left shows variations with respect to the translation that conventional CNNs are able to accommodate with ease. The image on the right shows variation in terms of symmetry (rotation) that conventional CNNs fail to adjust to.

E2-CNNs propose a solution to this by guaranteeing a specified transformation behaviour of their feature spaces under transformations of their input. E2-CNNs are equivariant under all isometries E(2) of the image plane i.e. under translations, rotations and reflections.

We have used the e2cnn library to implement the equivariant neural network. This work mainly focuses on circular and dihedral symmetries. A cyclic symmetry group can be understood as a group 𝐶N which contains 𝑁 discrete planar rotations. Similarly, a dihyderal symmetry group can be understood as a group 𝐷N which contains reflections and N discrete planar rotations. In our project, we work with C4, C8, D4 and D8 symmetries. For comparison, we have used the ResNet-18 model.

For preprocessing the data, we do the following steps:

  • Cropped to size 128 ✕ 128 from original size 150 ✕ 150.
  • Padding to increase the size to 129 ✕ 129. This allows us to use odd-size filters with stride 2 when downsampling a feature map in the model.
  • We upsample an image by a factor of 3, rotate it and finally downsample it again. This allows us to reduce interpolation artifacts (e.g. when testing the model on rotated images).

The code for the network architecture of the equivariant model used in our experiments is given below:

Visit the Github link for my project in order to train your data with our network. The repository allows you to create a virtual Conda environment with all required dependencies installed and play around with the hyperparameters. The repository contains the code for environment creating and training the model.

Results

The results obtained from the experiments conducted show that the equivariant neural network is indeed more efficient in capturing the symmetries. We have used the ROC AUC score (micro average and One vs Rest) as the evaluation metric for the experiment.

ROC AUC micro average scores for different neural network models
ROC AUC plots for model C8 (best performance) (on the left)and ResNet-18 (on the right) for Model F.
Confusion Matrix for model C8 (best performance) (on the left)and ResNet-18 (on the right) for Model F.
ROC AUC plots for model D8 (best performance) (on the left)and ResNet-18 (on the right) for Model J.
Confusion Matrix for model D8 (best performance) (on the left)and ResNet-18 (on the right) for Model J.

ROC AUC (One vs Rest) results for Model F:

ROC AUC scores (One vs Rest) for Model F

ROC AUC (One vs Rest) results for Model J:

ROC AUC scores (One vs Rest) for Model J

Visualization of feature space

The equivariant networks are equivariant under various transformations like rotation/reflection and the feature space is independent of the orientation of the image. The same doesn’t hold true for conventional CNN. Here we try to visualize the feature space of the C4, C8 and CNN networks to test that. It can be seen from the diagrams that the CNN does not produce stable representations when subjected to rotation while C4 and C8 do.

Visualization of feature space for images with spherical substructure in Model J. This visualization is for CNN.
Visualization of feature space for images with spherical substructure in Model J. This visualization is for C4.
Visualization of feature space for images with spherical substructure in Model J. This visualization is for C8.

Future Work and Final Thoughts

Even with the great results, there is room for improvement in several areas. The first one is computational memory consumption. Although an equivariant network has a lesser number of parameters when compared to a ResNet or VGG-Net, it takes huge computational resources to train. The other domain would be to design networks that can use the continuous symmetry groups (efficiently!!) instead of discrete symmetry groups. Since our application is one that suffers from data scarcity (from the real world), an idea like equivariance can prove to be very useful for generalizing over the data without overfitting. Also, the research domain of equivariant neural networks is still in its early years so it would definitely help to look out for newer ideas from the domain.

I would like to thank Pranath Reddy, Michael Toomey, Sergei Gleyzer, Anna Parul, Sourav Raha and rest of the ML4SCI community for the guidance and support they have provided me in the past months.

Also, I would like to thank Google and all the organizers of Google Summer of Code who make this happen every year.

--

--