Google Summer of Code 2021

ML4Sci

Purva Chaudhari
4 min readAug 21, 2021

Introduction

Project title: End-to-End Deep Learning Reconstruction for CMS Experiment

Developers:

  • Student: Purva Chaudhari
  • Mentors: Sergei Gleyzer, Davide Di Croce, Emanuele Usai, Bjorn Burkle and Nikolas Pervan

Organization: Machine Learning for Science (ML4Sci)

ML4Sci

About E2E Framework: https://shra1-25.github.io/E2eDLrecReport/#

The End to End framework provides functionality to run the CMSSW inference on various deep learning models for various taggers like Electron-Photon, Quark-Gluon, Top particles and perform the task of classification (regression to come soon). The user can add .pb file of the deep learning model to the framework and run the inference to benchmark the performance.
Refer the above blog to get acquainted with the CMSSW framework and its modular architecture.

My contribution:
https://github.com/Purva-Chaudhari/RecoE2E/compare/taubranch4

This summer I had a pleasure to be working on my project End-to-End Deep Learning Reconstruction for CMS Experiment as a part of Google Summer of Code 2021. I extended the integration framework to tau tagger, included few secondary channels like BPIX 4 and SiStrip layers — TOB, TIB, TEC, TID and benchmarked their performance.

E2E Project Structure:

The current project has the following directory structure.

E2E Project Structure (Highlighted are my additions)
  • Data Formats: Contains class and declarations of jet arrays.
  • Frame Producers: The frame producer is the core directory. The extracted arrays are stored in the edm root file as flat 1-Dimensional vectors. Seed coordinates are read from the edm file and cropped into the detector image arrays into 3-dimensional frames centered around the extracted seed coordinates. The jet seed coordinates are selected on the basis of certain criteria such as those seeds whose eta coordinate is at the corner of the detector images are neglected, as well as the seeds having energies less than zero are neglected.
  • Taggers: The inference runs on the cropped frames and stores the predictions back to the edm production file. Each of the tagger contain corresponding jet and channel selection criteria. Currently there are following channels included in the framework
Channels used by taggers
  • TF Models: The inference of a trained model is run using the Tensorflow C++ API present in the CMSSW Framework. The tensorflow model trained in python should be stored in protobuf (.pb) format. The name of the protobuf file can be passed as a parameter in the EDProducers
Data flow in E2E

Benchmarking performance:

I used a resnet model with 13 channels to benchmark the performance on Top dataset. (TTbar)

TimeReport> Time report complete in 121.112 seconds  
Time Summary:
- Min event: 0.256601
- Max event: 4.54059
- Avg event: 0.702128
- Total loop: 74.5741
- Total init: 31.6861
- Total job: 121.112
- EventSetup Lock: 3.48091e-05
- EventSetup Get: 4.14612
Event Throughput: 1.34095 ev/s
CPU Summary:
- Total loop: 83.4072
- Total init: 20.5239
- Total extra: 0
- Total job: 115.205
Processing Summary:
- Number of Events: 100
- Number of Global Begin Lumi Calls: 1
- Number of Global Begin Run Calls: 1
=============================================
MessageLogger Summary
type category sev module subroutine count total ---- --------------- -- ----------- ------------- ----- ----- 1 MemoryCheck -w DetFrameProducer 8 8 2 MemoryCheck -w JetFrameProducer 3 3 3 MemoryCheck -w PoolOutputModule 11 11 4 MemoryCheck -w TauTagger:TauTag 26 26 5 MemoryCheck -w source 2 2 6 TimeEvent -w PostProcessPath 100 100 7 TimeModule -w DetFrameProducer 100 100 8 TimeModule -w EndPathStatusIns 100 100 9 TimeModule -w JetFrameProducer 100 100 10 TimeModule -w PathStatusInsert 100 100 11 TimeModule -w PoolOutputModule 100 100 12 TimeModule -w TauTagger:TauTag 100 100 13 TimeModule -w TriggerResultIns 100 100 14 TimeReport -e AfterBeginJob 1 1 15 TimeReport -e AfterModEndJob 1 1 16 MemoryReport -s AfterModEndJob 1 1 17 fileAction -s file_close 1 1 18 fileAction -s file_open 2 2
type category Examples: run/evt run/evt run/evt ---- -------------- --------------- --------------- --------------- 1 MemoryCheck 1/2 1/3 1/47
2 MemoryCheck 1/3 1/10 1/11
3 MemoryCheck 1/2 1/9 1/67
4 MemoryCheck 1/2 1/3 1/99
5 MemoryCheck PostProcessEvent PostProcessEvent
6 TimeEvent 1/1 1/2 1/100
7 TimeModule 1/1 1/2 1/100
8 TimeModule 1/1 1/2 1/100
9 TimeModule 1/1 1/2 1/100
10 TimeModule 1/1 1/2 1/100
11 TimeModule 1/1 1/2 1/100
12 TimeModule 1/1 1/2 1/100
13 TimeModule 1/1 1/2 1/100
14 TimeReport BeforeEvents
15 TimeReport PostGlobalEndRun
16 MemoryReport PostGlobalEndRun
17 fileAction PostGlobalEndRun
18 fileAction pre-events pre-events
Severity # Occurrences Total Occurrences
-------- ------------- -----------------
System 3 3
dropped waiting message count 0

Steps to run the code:

  1. Set CMSSW envirnoment on docker/ lxplus:
scram p <CMSSW version eg: CMSSW_10_6_20>
cd CMSSW_10_6_20/src
cmsenv

2. Git clone the repository

git clone -b taubranch4 https://github.com/Purva-Chaudhari/RecoE2E

3. Compile/Build. For using multi-core processor add -j n

scram b -j 5

4. Run the inference (eg Tau Tagger). (Make sure you add the root files to your remote)

cmsRun RecoE2E/TauTagger/python/TauInference_cfg.py inputFiles=file:./TTbar_TuneCUETP8M1_13TeV_pythia8_2018.root doTracksAtECALadjPt=False TauModelName=ResNet_8_channel_tf13.pb doBPIX3=False doBPIX4=False doTOB=False doTIB=False doTID=False

Acknowledgement:

I whole heartedly thank my mentors for all their guidance and being available to clear all the doubts through out the summer. It was indeed a summer full of learning and I am ever grateful to the organization and the opportunity!

References:

  1. https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookCMSSWFramework
  2. https://ml4sci.org/

--

--