At LightOn, we aim at being at the cutting-edge of the machine learning research. Therefore, it is with pleasure that we attended the 37th International Conference On Machine Learning, ICML 2020!
The organizers did an amazing job to turn this edition virtual, and present a variety of tutorials, papers, workshops, and panels.
While the conference covered a vast variety of topics, we focus this blog post on three of our favourite papers accepted and presented at ICML 2020. Let us introduce them, in no particular order!
#1 Variational Imitation Learning with Diverse-quality Demonstrations
In Reinforcement Learning (RL — that we explored here), an agent interacts with an environment and takes actions to maximize long-term rewards. However, it is not always possible to define clearly a reward function — for example, when teaching a self-driving car. A solution to this problem is Imitation Learning (IL): the agent imitates actions performed by experts. But, what happens if demonstrations by experts are too costly? Or if we don’t really have “experts” around (e.g., can we really find an “expert” in driving)? Voot Tangkaratt, Bo Han, Emti Khan, and Masashi Sugiyama, tackle these issues by introducing a Variational IL with Diverse-quality demonstrations (VILD) method .
Their idea is to bypass the difficulty of estimating the quality of the actions chosen by a “non-expert” demonstrations (see Figure 1) using a variational approach. They also estimate a reward function which can be used to improve on the agent. VILD matches state-of-the-art results on benchmarks. More importantly, it shows that IL can be used in realistic scenarios!
#2 T-GD: Transferable GAN-generated Images Detection Framework
Hyeonseong Jeon, Youngoh Bang, Junyaup Kim, and Simon S. Woo  chose to address the issue of detecting images produced by Generative Adversarial Networks (GANs). Indeed, those networks are getting more and more performant, generating higher-quality images, without obvious patterns, and could be misused in the future (see Figure 2 for some progress).
They introduce the Transferable GAN-Detection (T-GD) framework, in which student and teacher models iteratively teach and evaluate each other to improve their performances. More importantly, they use transfer learning (more detail here) to apply knowledge already learned on source datasets to smaller target datasets. Their approach not only achieve state-of-the-art results on source data, but is also data-efficient thanks to transfer learning.
#3 Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
If you are on Twitter, you probably have seen a sample of the capabilities of OpenAI GPT-3 , for example writing React or JSX code, or writing like a lawyer! For more examples, look at this Twitter thread on experiences with GPT-3. This work has shown that massively scaling up language models greatly improves task-agnostic and few-shot performance, but training it on a single V100 cloud instance would cost over 4.6 million dollars! A central challenge is the memory and computational cost of the attention mechanism, that is quadratic in sequence length.
Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret  devised a new formulation of linear self-attention using kernel feature representations and the associativity property of matrix products, reducing the complexity from quadratic to linear, making inference thousands of times faster. Finally, they also show that:
[…] any transformer layer with causal masking can be written as a model that, given an input, modifies an internal state and then predicts an output, namely a Recurrent Neural Network (RNN).
Hence the first part of the title!
With over a thousand papers, it is of course not possible to mention all the great work which has been presented at ICML 2020. However, we wanted to mention some other papers that caught our attention:
- Bogatskiy, A., Anderson, B., Offermann, J. T., Roussi, M., Miller, D. W., & Kondor, R. (2020). Lorentz Group Equivariant Neural Network for Particle Physics. While this paper is a bit heavy on the mathematical side, it provides an interesting path to physically interpretable machine learning models.
- Hwee Ling Sim, R., Zhang, Y., Kian Hsiang Low, B., Choon Chan, M. (2020). Collaborative Machine Learning with Incentive-Aware Model Rewards. The authors tackle the issue of rewarding fairly collaborative machine learning.
- Belilovsky, E., Eickenberg, M., Oyallon, E. (2020), Decoupled Greedy Learning of CNNs. It introduces a simple training scheme that decouples the training of each layer in a model, with strong results on difficult datasets.
- Ryali, C., Hopfield, J., Grinberg, L., Krotov, D., (2020) Bio-Inspired Hashing for Unsupervised Similarity Search. Inspired by the olphactory circuit of the fruit fly, the authors design a biologically plausible hashing scheme to perform unsupervised similarity search.
We would like to thank the organizers of the International Conference on Machine Learning 2020 for setting up a great virtual edition! What were your personal highlights of this conference?
LightOn is a hardware company that develops new optical processors that considerably speed up Machine Learning computation. LightOn’s processors open new horizons in computing and engineering fields that are facing computational limits. Interested in speeding your computations up? Try out our solution on LightOn Cloud! 🌈
 Tangkaratt, V., Han, B., Khan, M. E., & Sugiyama, M. (2019). VILD: Variational Imitation Learning with Diverse-quality Demonstrations. arXiv preprint arXiv:1909.06769.
 Jeon, H., Bang, Y., Kim, J., & Woo, S. (2020). T-GD: Transferable GAN-generated Images Detection Framework. In Proceedings of the ICML Conference.
 Choi, Y., Choi, M., Kim, M., Ha, J. W., Kim, S., & Choo, J. (2018). Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8789–8797).
 Choi, Y., Uh, Y., Yoo, J., & Ha, J. W. (2020). Stargan v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8188–8197).
 Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., … & Agarwal, S. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.
 Katharopoulos, A., Vyas, A., Pappas, N., & Fleuret, F. (2020). Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention. arXiv preprint arXiv:2006.16236.