Finding Elusive Particles With Deep Learning

Published in

Tenzar

4 min readJul 19, 2018

IMPACT
By Sam May
June 29, 2018

This a research guest post by Sam May. He shares the challenges and opportunities of applying Deep Learning to Particle Physics.

About the author: Sam May is a researcher at the UC San Diego Physics Department working as part of the CMS Collaboration between CERN and the particle physics community.

The Experiment

Many analyses within the CMS experiment at the CERN LHC search for events containing W bosons, as they are often found as decay products in physics processes of interest, e.g., the decay of a Higgs boson into two W bosons. However, a W boson decays almost immediately after its production, and far before it reaches any detectors. So, to identify a W boson, we must infer its presence through the signatures of other particles which do reach our detectors.

The Physics

The W boson often decays to a muon and a neutrino — muons are measured very well by the CMS detector and serve as a useful method for identifying W bosons. However, muons can be produced through other physics processes as well, often in hadronic jets. Therefore, an important task is the ability to discriminate between muons which come from a W boson (often called “prompt” muons) and those which come from other processes, like hadronic jets (often called “fake” muons).

The Compact Muon Solenoid (CMS) at CERN. Source: CERN CSM

The Method

Prompt muons tend to be “isolated” within the detector, while fake muons are much less isolated. That is, prompt muons are typically found with fewer and less energetic neighboring particles than fake muons. Within high energy physics, a standard method for discriminating between prompt and fake muons is summing up the energies of all other particles near the muon, normalizing by the energy of the muon itself — this is called the “relative isolation” of the muon. Prompt muons tend to have low values of relative isolation while fake muons tend to have higher values.

This method is not perfect, however: prompt muons can sometimes have large values of relative isolation, and fake muons can sometimes have small values of relative isolation. In seeking to improve upon this method, one could try to come up with a better way of utilizing the information contained in the muon’s neighboring particles. Rather than simply summing up the energies of neighboring particles, perhaps there is another way of condensing the information they carry about the likelihood of this muon being prompt or fake. Deep Neural Networks (DNNs) have proved to be extremely promising in this aspect in high energy physics.

Applying Deep Learning

My studies use quantitative properties about the muon, (A.K.A “features” in machine learning jargon) and its neighboring particles as inputs to a DNN. The challenge in using features describing the neighboring particles is the fact that there is a variable number of neighboring particles in each event: one muon may have 10 particles near it while another may have 100. This is a challenge because most machine learning algorithms require a fixed number of input features. However, a type of DNN that can accept a variable length input is a Recurrent Neural Network (RNN). RNNs are typically used in applications like speech recognition and time-series prediction. In order to utilize an RNN, the neighboring particles are considered as an ordered 1d sequence, sorted by the component of their energy that lies along the direction of the muon. The output of the RNN is then combined with features describing the muon itself in several fully-connected dense layers, eventually yielding a single output that reflects the DNN’s degree of belief that a given muon is prompt.

The Technical Challenges

However, there are also many technical challenges associated with the use of RNNs. There are typically 10–30 other particles identified in the region around the muon used for the isolation calculation, though some events can have as many as 100 particles. A single neighboring particle can have 5–7 features describing it, meaning the total number of input features is typically in the 100s, and the size of the training data can exceed 60 GB. Combined with the fact that millions of events are needed for training, the processing and memory needs of this study are tremendous.

Deploying RNNs to RunBox

I used Tenzar’s computing tool RunBox to train RNNs and search for W bosons from particle collision data. From a web interface and simple linux command line tool, I deployed compute nodes in the cloud with 64 cores and multiple GPUs to train the RNNS. RunBox’s ease of use and command syntax allowed me to use compute optimized resources and circumvent non-trivial computing infrastructure problems that were already solved in RunBox. Without RunBox, building solutions to these technical high-throughput infrastructure challenges would have cost our team dozens of hours that could have been otherwise devoted to model development and training time. In addition, RunBox’s Smart Data Throughput feature sped-up the data transfer between the local cluster and the cloud nodes, saving on significant transfer wait-times for multi-gigabyte workloads.

Ultimately, RunBox’s integration of GPU compatible Docker images and user-friendly (“natural”) commands allowed me to spend more time focusing on the physics and less time dealing with technical challenges.

—

For more, visit www.tenzar.com/library or follow us on Twitter.