Google Summer of Code 2021

ML4Sci

Purva Chaudhari

4 min readAug 21, 2021

Introduction

Project title: End-to-End Deep Learning Reconstruction for CMS Experiment

Developers:

Student: Purva Chaudhari
Mentors: Sergei Gleyzer, Davide Di Croce, Emanuele Usai, Bjorn Burkle and Nikolas Pervan

Organization: Machine Learning for Science (ML4Sci)

About E2E Framework: https://shra1-25.github.io/E2eDLrecReport/#

The End to End framework provides functionality to run the CMSSW inference on various deep learning models for various taggers like Electron-Photon, Quark-Gluon, Top particles and perform the task of classification (regression to come soon). The user can add .pb file of the deep learning model to the framework and run the inference to benchmark the performance.
Refer the above blog to get acquainted with the CMSSW framework and its modular architecture.

My contribution:
https://github.com/Purva-Chaudhari/RecoE2E/compare/taubranch4

This summer I had a pleasure to be working on my project End-to-End Deep Learning Reconstruction for CMS Experiment as a part of Google Summer of Code 2021. I extended the integration framework to tau tagger, included few secondary channels like BPIX 4 and SiStrip layers — TOB, TIB, TEC, TID and benchmarked their performance.

E2E Project Structure:

The current project has the following directory structure.

Data Formats: Contains class and declarations of jet arrays.
Frame Producers: The frame producer is the core directory. The extracted arrays are stored in the edm root file as flat 1-Dimensional vectors. Seed coordinates are read from the edm file and cropped into the detector image arrays into 3-dimensional frames centered around the extracted seed coordinates. The jet seed coordinates are selected on the basis of certain criteria such as those seeds whose eta coordinate is at the corner of the detector images are neglected, as well as the seeds having energies less than zero are neglected.
Taggers: The inference runs on the cropped frames and stores the predictions back to the edm production file. Each of the tagger contain corresponding jet and channel selection criteria. Currently there are following channels included in the framework

TF Models: The inference of a trained model is run using the Tensorflow C++ API present in the CMSSW Framework. The tensorflow model trained in python should be stored in protobuf (.pb) format. The name of the protobuf file can be passed as a parameter in the EDProducers

Benchmarking performance:

I used a resnet model with 13 channels to benchmark the performance on Top dataset. (TTbar)

TimeReport> Time report complete in 121.112 seconds  
Time Summary:   
- Min event:   0.256601  
- Max event:   4.54059  
- Avg event:   0.702128  
- Total loop:  74.5741  
- Total init:  31.6861  
- Total job:   121.112  
- EventSetup Lock:   3.48091e-05  
- EventSetup Get:   4.14612  
Event Throughput: 1.34095 ev/s  
CPU Summary:   
- Total loop:  83.4072  
- Total init:  20.5239  
- Total extra: 0  
- Total job:   115.205  
Processing Summary:   
- Number of Events:  100  
- Number of Global Begin Lumi Calls:  1  
- Number of Global Begin Run Calls: 1=============================================  
MessageLogger Summary   
type    category    sev    module   subroutine     count    total  ---- --------------- -- ----------- -------------  -----    -----     1       MemoryCheck  -w DetFrameProducer              8        8     2       MemoryCheck  -w JetFrameProducer              3        3     3       MemoryCheck  -w PoolOutputModule              11       11     4       MemoryCheck  -w TauTagger:TauTag              26       26     5       MemoryCheck  -w source                         2        2     6       TimeEvent    -w PostProcessPath               100      100     7       TimeModule   -w DetFrameProducer              100      100     8       TimeModule   -w EndPathStatusIns              100      100     9       TimeModule   -w JetFrameProducer              100      100    10      TimeModule   -w PathStatusInsert              100      100    11      TimeModule   -w PoolOutputModule              100      100    12      TimeModule   -w TauTagger:TauTag              100      100    13      TimeModule   -w TriggerResultIns              100      100    14      TimeReport   -e AfterBeginJob                  1        1    15      TimeReport   -e AfterModEndJob                 1        1    16      MemoryReport -s AfterModEndJob                 1        1    17      fileAction   -s file_close                     1        1    18      fileAction   -s file_open                      2        2 type    category    Examples: run/evt      run/evt          run/evt  ---- -------------- --------------- --------------- ---------------     1 MemoryCheck          1/2              1/3              1/47     
2 MemoryCheck          1/3              1/10             1/11     
3 MemoryCheck          1/2              1/9              1/67     
4 MemoryCheck          1/2              1/3              1/99     
5 MemoryCheck          PostProcessEvent PostProcessEvent      
6 TimeEvent            1/1              1/2              1/100     
7 TimeModule           1/1              1/2              1/100     
8 TimeModule           1/1              1/2              1/100     
9 TimeModule           1/1              1/2              1/100    
10 TimeModule          1/1              1/2              1/100    
11 TimeModule          1/1              1/2              1/100    
12 TimeModule          1/1              1/2              1/100    
13 TimeModule          1/1              1/2              1/100    
14 TimeReport          BeforeEvents                          
15 TimeReport          PostGlobalEndRun                      
16 MemoryReport        PostGlobalEndRun                      
17 fileAction          PostGlobalEndRun                      
18 fileAction          pre-events       pre-eventsSeverity    # Occurrences   Total Occurrences 
--------    -------------   ----------------- 
System                  3                   3  
dropped waiting message count 0

Steps to run the code:

Set CMSSW envirnoment on docker/ lxplus:

scram p <CMSSW version eg: CMSSW_10_6_20>
cd CMSSW_10_6_20/src
cmsenv

2. Git clone the repository

git clone -b taubranch4 https://github.com/Purva-Chaudhari/RecoE2E

3. Compile/Build. For using multi-core processor add -j n

scram b -j 5

4. Run the inference (eg Tau Tagger). (Make sure you add the root files to your remote)

cmsRun RecoE2E/TauTagger/python/TauInference_cfg.py inputFiles=file:./TTbar_TuneCUETP8M1_13TeV_pythia8_2018.root doTracksAtECALadjPt=False TauModelName=ResNet_8_channel_tf13.pb doBPIX3=False doBPIX4=False doTOB=False doTIB=False doTID=False

Acknowledgement:

I whole heartedly thank my mentors for all their guidance and being available to clear all the doubts through out the summer. It was indeed a summer full of learning and I am ever grateful to the organization and the opportunity!