LightOn Highlights from ICML 2020

5 min readJul 20, 2020

*The LightOn Machine Learning team attending the International Conference On Machine Learning. Picture courtesy of* *A. Chatelain*.

At LightOn, we aim at being at the cutting-edge of the machine learning research. Therefore, it is with pleasure that we attended the 37th International Conference On Machine Learning, ICML 2020!

The organizers did an amazing job to turn this edition virtual, and present a variety of tutorials, papers, workshops, and panels.

While the conference covered a vast variety of topics, we focus this blog post on three of our favourite papers accepted and presented at ICML 2020. Let us introduce them, in no particular order!

#1 Variational Imitation Learning with Diverse-quality Demonstrations

In Reinforcement Learning (RL — that we explored here), an agent interacts with an environment and takes actions to maximize long-term rewards. However, it is not always possible to define clearly a reward function — for example, when teaching a self-driving car. A solution to this problem is Imitation Learning (IL): the agent imitates actions performed by experts. But, what happens if demonstrations by experts are too costly? Or if we don’t really have “experts” around (e.g., can we really find an “expert” in driving)? Voot Tangkaratt, Bo Han, Emti Khan, and Masashi Sugiyama, tackle these issues by introducing a Variational IL with Diverse-quality demonstrations (VILD) method [1].

*Figure 1: During the k-th demonstration performed by an expert (left), the action a*ᵢ *taken in the state s*ᵢ *is optimal. In the case of diverse-quality demonstrations (right), the observed actions u*ᵢ *are not optimal. Figure adapted from Ref. [1].*

Their idea is to bypass the difficulty of estimating the quality of the actions chosen by a “non-expert” demonstrations (see Figure 1) using a variational approach. They also estimate a reward function which can be used to improve on the agent. VILD matches state-of-the-art results on benchmarks. More importantly, it shows that IL can be used in realistic scenarios!

#2 T-GD: Transferable GAN-generated Images Detection Framework

Hyeonseong Jeon, Youngoh Bang, Junyaup Kim, and Simon S. Woo [2] chose to address the issue of detecting images produced by Generative Adversarial Networks (GANs). Indeed, those networks are getting more and more performant, generating higher-quality images, without obvious patterns, and could be misused in the future (see Figure 2 for some progress).

Figure 2: Image generated from the CelebA dataset by StarGAN [3], first introduced in 2017, and StarGAN v2 [4], introduced at the end of 2019. The first column shows input images, while the other two columns are synthesized by the GANs.

They introduce the Transferable GAN-Detection (T-GD) framework, in which student and teacher models iteratively teach and evaluate each other to improve their performances. More importantly, they use transfer learning (more detail here) to apply knowledge already learned on source datasets to smaller target datasets. Their approach not only achieve state-of-the-art results on source data, but is also data-efficient thanks to transfer learning.

#3 Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

If you are on Twitter, you probably have seen a sample of the capabilities of OpenAI GPT-3 [5], for example writing React or JSX code, or writing like a lawyer! For more examples, look at this Twitter thread on experiences with GPT-3. This work has shown that massively scaling up language models greatly improves task-agnostic and few-shot performance, but training it on a single V100 cloud instance would cost over 4.6 million dollars! A central challenge is the memory and computational cost of the attention mechanism, that is quadratic in sequence length.

Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret [6] devised a new formulation of linear self-attention using kernel feature representations and the associativity property of matrix products, reducing the complexity from quadratic to linear, making inference thousands of times faster. Finally, they also show that:

[…] any transformer layer with causal masking can be written as a model that, given an input, modifies an internal state and then predicts an output, namely a Recurrent Neural Network (RNN).

Hence the first part of the title!

Figure 3: Image completions generated by the linear transformer on CIFAR-10, taken from Fig. 4 in the arXiv.

Honorable Mentions

With over a thousand papers, it is of course not possible to mention all the great work which has been presented at ICML 2020. However, we wanted to mention some other papers that caught our attention:

Bogatskiy, A., Anderson, B., Offermann, J. T., Roussi, M., Miller, D. W., & Kondor, R. (2020). Lorentz Group Equivariant Neural Network for Particle Physics. While this paper is a bit heavy on the mathematical side, it provides an interesting path to physically interpretable machine learning models.
Hwee Ling Sim, R., Zhang, Y., Kian Hsiang Low, B., Choon Chan, M. (2020). Collaborative Machine Learning with Incentive-Aware Model Rewards. The authors tackle the issue of rewarding fairly collaborative machine learning.
Belilovsky, E., Eickenberg, M., Oyallon, E. (2020), Decoupled Greedy Learning of CNNs. It introduces a simple training scheme that decouples the training of each layer in a model, with strong results on difficult datasets.
Ryali, C., Hopfield, J., Grinberg, L., Krotov, D., (2020) Bio-Inspired Hashing for Unsupervised Similarity Search. Inspired by the olphactory circuit of the fruit fly, the authors design a biologically plausible hashing scheme to perform unsupervised similarity search.

We would like to thank the organizers of the International Conference on Machine Learning 2020 for setting up a great virtual edition! What were your personal highlights of this conference?

About Us

LightOn is a hardware company that develops new optical processors that considerably speed up Machine Learning computation. LightOn’s processors open new horizons in computing and engineering fields that are facing computational limits. Interested in speeding your computations up? Try out our solution on LightOn Cloud! 🌈

Follow us on Twitter at @LightOnIO, subscribe to our newsletter, and/or register to our workshop series. We live stream, so you can join from anywhere. 🌍

The authors

Amélie Chatelain, Machine Learning Engineer at LightOn AI Research, and Iacopo Poli, Lead Machine Learning Engineer at LightOn AI Research.

References

[1] Tangkaratt, V., Han, B., Khan, M. E., & Sugiyama, M. (2019). VILD: Variational Imitation Learning with Diverse-quality Demonstrations. arXiv preprint arXiv:1909.06769.

[2] Jeon, H., Bang, Y., Kim, J., & Woo, S. (2020). T-GD: Transferable GAN-generated Images Detection Framework. In Proceedings of the ICML Conference.

[3] Choi, Y., Choi, M., Kim, M., Ha, J. W., Kim, S., & Choo, J. (2018). Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8789–8797).

[4] Choi, Y., Uh, Y., Yoo, J., & Ha, J. W. (2020). Stargan v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8188–8197).

[5] Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., … & Agarwal, S. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.

[6] Katharopoulos, A., Vyas, A., Pappas, N., & Fleuret, F. (2020). Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention. arXiv preprint arXiv:2006.16236.