MLSys 2020: Experiencing cutting edge research in machine learning and systems

By Yifan Bai, MEng ’20 (EECS)

Published in

Berkeley Master of Engineering

4 min readMar 23, 2020

The Conference on Machine Learning and Systems (MLSys) is an annual conference targeting research in systems and machine learning (ML). This year it was held in Austin, Texas. I was honored to have an opportunity to attend MLSys from March 2-4, 2020 to present my research along with my teammates, as well as learn from and network with other researchers in this field.

The first two days of the conference included oral presentations, keynote talks, project demos, and poster sessions. Presentations were grouped based on topic, including distributed and parallel learning algorithms, efficient model training, efficient inference and model serving, etc. On the last day, there were workshops for seven interesting topics in System ML, including on-device intelligence, automated machine learning for networks and distributed computing, etc. Due to the novel coronavirus (COVID-19), the number of attendees decreased slightly compared to last year, and some speakers did their presentation remotely via Zoom.

My own research project, “BPPSA: Back-propagation by Parallel Scan Algorithm,” was done in conjunction with Shang Wang, a second-year MSc student at the University of Toronto, and was supervised by Gennady Pekhimenko, an assistant professor at the University of Toronto and Vector Institute. This was my undergraduate capstone project at the University of Toronto and most of my work was done during my time there as an undergraduate. We submitted our work to MLSys in September 2019, so I put UC Berkeley as my affiliation. My partner Shang gave a presentation on the morning of March 2.

The background of our research is that back propagation (BP) is a popular algorithm in deep learning, which tells how much the output will change due to the change of input. This is usually computed by chain rule layer by layer, showing strong sequential dependency hinders its scalability on parallel systems. We utilize an operation, scan, that performs an in-order aggregation on a sequence of input values and returns the partial result at each step. Blelloch scan is a special scan operation that helps with parallelization.

Our major contributions are as follows: we reformulated BP as a scan operator and modified the Blelloch scan algorithm to efficiently scale BP in a parallel computing environment, reducing theoretical step complexity from O(n) to O(log(n)). We also developed routines to efficiently generate sparse transposed Jacobian matrices for various operators. We evaluated BPPSA on a vanilla Recurrent Neural Network (RNN) training with synthetic datasets and RNN with Gated Recurrent Units (GRU) training with the IRMAS dataset. This achieves a 2.75 times greater speedup in end-to-end training time and up to 108 times greater speedup on backward pass.

Another presentation that I found particularly interesting was Checkmate, presented by Paras Jain, who is a PhD student in RISE lab, BAIR lab, and Berkeley Deep Drive lab at UC Berkeley. Amir Gholami and Kurt Keutzer, two other authors of this paper, are my current MEng capstone project partners. The full title of this paper is “Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization.” Rematerialization means recomputing intermediate values rather than retaining them in memory in order to achieve better time and space efficiency. The Checkmate team introduced a formalization of the rematerialization problem as a mixed-integer linear program with a substantially flexible search space. They implemented this algorithm in TensorFlow, an open-source artificial intelligence library, at runtime. Checkmate could enable training models with up to 5.1 times larger input sizes and with up to 2.6 times larger batch sizes than prior work with minimal overhead.

Besides oral presentations of accepted papers, keynote talks, demonstrations, and workshops, MLSys also featured networking events organized by industry sponsors. On March 3, I went to the Facebook MLSys Happy Hour in the evening. At this informal event, I got a chance to talk to several machine learning experts at Facebook and hear their stories. Attendees were also able to try out the newest Oculus VR headsets.

In summary, I was very happy to receive the opportunity to attend MLSys this year and I have learned a lot about the latest research development in System ML areas. This will not only increase my knowledge stack, but also help with my current MEng capstone project. I would like to thank my coworker Shang Wang and supervisor Gennady Pekhimenko’s hard work to get our paper accepted, and also Fung Institute for supporting my travel.

About the author

Yifan Bai is a current Master of Engineering candidate in the Electrical Engineering & Computer Sciences (EECS) department at UC Berkeley.

Connect with Yifan.

MLSys 2020: Experiencing cutting edge research in machine learning and systems

By Yifan Bai, MEng ’20 (EECS)

About the author

Written by Berkeley Master of Engineering