# GSoC 2021 — Graph Neural Networks for Particle Momentum Estimation in the CMS Trigger System

--

This blog summarizes my Google Summer of Code (GSoC) 2021 project under Machine Learning for Science (ML4SCI) umbrella organization.

## Project Description

The Compact Muon Solenoid (CMS) is a detector at the Large Hadron Collider (LHC) located near Geneva, Switzerland. The CMS experiment detects the resulting particles from the collusion and measures their kinematics using various sub detectors working in concert. High momentum muons are significant objects for many physics analyses at CMS. Therefore, an accurate momentum assignment scheme differentiating low momentum muons (background) from high momentum muons (signal) is crucial to the Endcap Muon Track Finder (EMTF) trigger. The first algorithm implemented in the trigger system was a discretized boosted decision tree. In this work, we study the use of deep learning algorithms (Fully Connected Neural Networks (FCNNs), Convolutional Neural Networks (CNNs) and Graph Naural Networks (GNNs)) at the trigger level that requires highly optimized inference. We develop and benchmark the GNNs for momentum regression in the trigger system.

## Repositories

I have compiled my work into a GitHub repo called

GSoC-2021-GNN_for_Trigger which contains all the codes to train models and their respective prediction files for muons’ momentum estimations. Models are trained utilizing both TensorFlow and PyTorch open source AI frameworks.

## About Me

I am Emre Kurtoğlu, a third year Ph.D. student in the ECE department at The University of Alabama (Tuscaloosa). I also work as a Graduate Research Assistant in the Laboratory of Computational Intelligence for Radar (CI4R) under supervision of Dr. Sevgi Zübeyde Gürbüz.

# The Data

Our dataset consists of more than 3 million muon events generated using Pythia and could be downloaded from the following link. MEx/x labels in the image show the locations of the detectors. In this dataset, we use 4 of them. For each detector we have 7 features, namely, Phi coordinate, Theta coordinate, Bending angle, Time info, Ring number, Front/rear hit and Mask. In addition, we have 3 more road variables: pattern straightness, zone, median theta. Thus, in total we have 7x4+3 = 31 features for each event.

# Models

In this work, although our main focus is on GNNs, we employed FCNNs and CNNs as well in order to make comparison between different learning methods. Since the results for FNNs and CNNs are already presented in the aforementioned GitHub repo, we will directly dive into GNNs.

## Graph Neural Networks

As a starting point, we employ the GNN architecture developed by Prateek Kumar Agnihotri, available in his GitHub repo with slight modifications. We use pytorch-geometric library as it offers lots of ease and computation efficiency in the implementation of GNNs.

We define our graph nodes as consecutive detectors as follows:

Hence our adjacency matrix can be defined as:

`edge_index = [(0,1),(1,2),(2,3),(3,2),(2,1),(1,0)]`

Our GNN architecture has the following structure:

`class MPNN(torch.nn.Module):`

def __init__(self):

super(MPNN, self).__init__()

self.conv1 = MPL(int(len(features)/4),128 )

self.conv2 = MPL(128,64)

self.conv3 = MPL(64,64 )

self.conv4 = MPL(64,64 )

self.lin1 = torch.nn.Linear(128, 128)

self.lin2 = torch.nn.Linear(128, 16)

self.lin3 = torch.nn.Linear(16, 16)

self.lin4 = torch.nn.Linear(16, 1)

self.lin5 = torch.nn.Linear(128, 128)

self.lin6 = torch.nn.Linear(128, 16)

self.lin7 = torch.nn.Linear(16, 16)

self.lin8 = torch.nn.Linear(16, 1)

self.global_att_pool1 = gnn.GlobalAttention(torch.nn.Sequential(torch.nn.Linear(64, 1)))

self.global_att_pool2 = gnn.GlobalAttention(torch.nn.Sequential(torch.nn.Linear(64, 1)))

def forward(self, data):

x, edge_index, batch = data.x, data.edge_index, data.batch

x = F.relu(self.conv1(x, edge_index))

x = F.relu(self.conv2(x, edge_index))

x1 = self.global_att_pool1(x, batch)

x = F.relu(self.conv3(x, edge_index))

x = F.relu(self.conv4(x, edge_index))

x2 = self.global_att_pool2(x, batch)

x_out = torch.cat([x1, x2], dim=1)

x = F.relu(self.lin1(x_out))

x = F.relu(self.lin2(x))

x = self.lin3(x)

x = self.lin4(x).squeeze(1)

return x

where MPL is a custom graph convolutional layer. It is observed that it outperforms python-geometric’s built-in graph convolution methods (e.g. GATv2Conv, SuperGATConv, EGConv).

In this study, our aim is to accurately estimate muons’ transverse momentum (pT). After training our models, we compare their performances using Mean Absolute Error (MAE), F-1 Score and Accuracy as metrics. Since F-1 score and accuracy are not directly applicable for regression problems, we define them as follows:

def f1_comp(y_true,y_pred):

f1 = []

for i in range(100):

grnd = y_true >= i

pred = y_pred >= i

f1.append(sklearn.metrics.f1_score(grnd,pred))

return f1def acc_comp(y_true,y_pred):

acc = []

for i in range(100):

grnd = y_true >= i

pred = y_pred >= i

cmp = np.sum(np.equal(grnd,np.squeeze(pred)))

acc.append(cmp/len(grnd)*100)

return acc

We obtained the following results for the defined metrics:

As can be seen from the first graph, GNN outperforms FCNN and CNN in terms of the error rate but doesn’t do the best job in other metrics.

In order to examine the effect of graph structure, we modified it by using features as nodes and detectors as node features, and the resulting graph looks like follows:

The reason for structring the graph like this is that the first three nodes (0, 1, 2) define the hit locations of muons on the detectors and other features can be inferred from them. Hence, we use unidirectional edges from node #2 to others (3, 4, 5, 6). The results we obtained from the modified graph structure are as follows:

As can be seen from the graphs, the modified graph structure does not perform well when compared with the baseline GNN or FCNN and CNN. One reason for this performance drop might be because of the insufficient number of node features (i.e. 4).

Next method we evaluated is to use multi-task learning with the inverse transverse momentum (1/pT) prediction. For this, we add another branch to our model as follows:

Notice that we have two output layers to predict pT and 1/pT and our total loss, Lₜₒₜₐₗ, has the form of:

where λ and L are the loss weights and losses for the given task. The following results are obtained with the utilization of multi-task learning:

As can be seen from the graphs, multi-task learning model performs very close to the baseline GNN model in all metrics.

Finally, we evaluate the performance of fusion of three models (FCNN, CNN and GNN). Fusion can be done in two ways, namely, decision level fusion and feature level fusion. In decision level fusion, we take the mean of predictions of different models while in feature level fusion, we merge the embeddings of the models by concatenating them. In order to merge the embeddings, we modify our architecture as follows:

class MPNN(torch.nn.Module):

def __init__(self):

super(MPNN, self).__init__()

# GNN

self.conv1 = MPL(int(len(features)/4),128)

self.conv2 = MPL(128,64)

self.conv3 = MPL(64,64)

self.conv4 = MPL(64,64)

self.lin1 = torch.nn.Linear(128+128+128, 128) self.lin2 = torch.nn.Linear(128, 16)

self.lin3 = torch.nn.Linear(16, 16)

self.lin4 = torch.nn.Linear(16, 1)

self.lin5 = torch.nn.Linear(128+128+128, 128) self.lin6 = torch.nn.Linear(128, 16)

self.lin7 = torch.nn.Linear(16, 16)

self.lin8 = torch.nn.Linear(16, 1)

self.global_att_pool1 = gnn.GlobalAttention(torch.nn.Sequential(torch.nn.Linear(64, 1)))

self.global_att_pool2 = gnn.GlobalAttention(torch.nn.Sequential(torch.nn.Linear(64, 1)))

# CNN

self.cnv1 = torch.nn.Conv2d(1, 64, (2,2), padding='same')

self.cnv2 = torch.nn.Conv2d(64, 128, (2,2), padding='same')

self.cnv3 = torch.nn.Conv2d(128, 256, (2,2), padding='same')

self.cnv4 = torch.nn.Conv2d(256, 256, (2,2), padding='same')

self.cnv5 = torch.nn.Conv2d(256, 128, (2,2), padding='same')

self.cnv6 = torch.nn.Conv2d(128, 128, (2,2), padding='same')

self.linr1 = torch.nn.Linear(7//2*4//2*128, 256)

self.drp1 = torch.nn.Dropout(0.5)

self.linr2 = torch.nn.Linear(256, 128)

# FCNN

self.linr3 = torch.nn.Linear(len(features), 512)

self.linr4 = torch.nn.Linear(512, 256)

self.linr5 = torch.nn.Linear(256, 128)

self.linr6 = torch.nn.Linear(128, 128)

self.linr7 = torch.nn.Linear(128, 128)

def forward(self, data):

x_orig, edge_index, batch = data.x, data.edge_index, data.batch

x_orig2 = torch.reshape(x_orig,(-1,1,4,7))

x_orig3 = torch.reshape(x_orig,(-1,len(features)))

# gnn

x = F.relu(self.conv1(x_orig, edge_index))

x = F.relu(self.conv2(x, edge_index))

x1 = self.global_att_pool1(x, batch)

x = F.relu(self.conv3(x, edge_index))

x = F.relu(self.conv4(x, edge_index))

x2 = self.global_att_pool2(x, batch)

x_out = torch.squeeze(torch.cat([x1, x2], dim=-1))

# cnn

x = F.relu(self.cnv1(x_orig2))

x = F.max_pool2d(F.relu(self.cnv2(x)), (2,2))

x = F.relu(self.cnv3(x))

x = F.relu(self.cnv4(x))

x = F.relu(self.cnv5(x))

x = F.relu(self.cnv6(x))

x = torch.flatten(x, 1)

x = F.relu(self.linr1(x))

x = self.drp1(x)

x_out2 = torch.squeeze(F.relu(self.linr2(x)))

# fcnn

x = F.relu(self.linr3(x_orig3))

x = F.relu(self.linr4(x))

x = F.relu(self.linr5(x))

x = F.relu(self.linr6(x))

x_out3 = F.relu(self.linr7(x))

# merge

x_out4 = torch.cat([x_out, x_out2, x_out3], dim=-1)

x = F.relu(self.lin1(x_out4))

x = F.relu(self.lin2(x))

x = self.lin3(x)

xf1 = self.lin4(x).squeeze(-1)

x = F.relu(self.lin5(x_out4))

x = F.relu(self.lin6(x))

x = self.lin7(x)

xf2 = F.sigmoid(self.lin8(x).squeeze(-1))

return xf1, xf2

Here `x_out4`

contains all the embeddings from FCNN, CNN and GNN, and `xf1`

and `xf2`

are the outputs for pT and 1/pT prediction. The performances we obtained with these fusion techniques are as follows:

From these graphs, it can be observed that although there is not much of a difference among GNNs in terms of MAE, a noticable performance boost can be observed for F-1 score and accuracy metrics, especially for higher pT values, and decision level fusion performs slightly better than feature level fusion. Moreover, in our task, the trigger threshold is operating mostly around 20 GeV, so the performance drops at the higher GeV values would not really hurt the performance of the trigger detection system.

# Final Thoughts

Although we observed some improvements by employing decision level and feature level fusion techniques, there is still further room for improvement in GNNs perhaps by exploring different graph structures and graph convolution techniques.

I would like to thank my mentors Prateek Kumar Agnihotri, Emanuele Usai, Davide Di Croce and Sergei Gleyzer for their support both in technical and non-technical aspects. I also would like to thank Ali Hariri for his useful feedbacks.

It was a great journey to be a part of GSoC program and ML4SCI team!