Our Five Favorite Papers from ICLR 2019

May 24, 2019 · 11 min read

by Akshay Budhkar

The International Conference for Learning Representations (ICLR) is one of the biggest machine learning conferences in the world. It’s a competitive conference, with the main conference having an acceptance rate of just 31.4% (500/1591), so the standard of content is high.

I attended the seventh edition of the conference in New Orleans along with Eddie Du, another Applied Research Scientist on the Georgian Impact team. In this post, we share our highlights — papers that we found interesting in areas where we saw the potential to add value in applied projects with the Georgian Partners portfolio.

1. [Contributed Talk] ImageNet-trained CNNs Are Biased Towards Texture; Increasing Shape Bias Improves Accuracy and Robustness

Why we like it: A simple but very efficient approach to improve object detection by relying on shapes (like humans do) as opposed to textures. The approach can be extended to other domains.

Convolutional Neural Nets (CNNs) tend to focus more on the texture of the object than the shape. However, this is not how we as humans parse images. Humans have a very strong shape bias, which means that if, for example, you see an elephant textured cat, you still know it’s a cat. CNNs don’t…yet.

In this talk, the authors showed how they induce shape bias by randomly stylizing their ImageNet dataset and adding random textures on top of the dataset using style transfer. This approach led to a shift in bias towards shapes.

Their results are promising and have benefits in scenarios where noise is added to images. A common use case might be self-driving cars, where weather — rain, snow or fog — may reduce the effectiveness of traditional CNNs.

We talked to the author during the poster session who confirmed that the training set for ImageNet and this modified dataset has the same number of data points. We discussed potential extensions to other domains, and the author thought that this is a viable strategy for audio and even text, if the biases are known and style transfer is a solved problem in that domain.

Read the paper here: ImageNet-trained CNNs Are Biased Towards Texture; Increasing Shape Bias Improves Accuracy and Robustness

2. [Poster] Generating Multiple Objects at Spatially Distinct Locations

Why we like it: Generative adversarial networks (GAN) dominated ICLR 2019. We liked this paper because the ideas presented are transferable to other use cases where we could use GANs to generate specific objects in specific locations.

This poster explores an interesting extension of GANs for images that should lead to more controlled image generation. The authors achieve this by extending the GAN architecture to specify which objects should appear in generated images and where they should appear within the image.

A common challenge with GANs is that it is hard to control the layout of the pictures created. Often the solution is to introduce a scene layout as an additional input. The downside is that this approach requires a lot of extra labels.

Here the authors suggest using a bounding box and a class label for each foreground object we want to see in the final image. So, for example, we could specify that we need a couch in this box and people in this other box and ask the GAN to generate within these confines.

This method keeps the normal GAN and object pathways (every object with bounding and class conditions) separate and concatenates them at the end before upsampling to the final output. While not a fair comparison to other methods, since it includes more information, this method does outperform the other GANs. Qualitatively, the images also look better and more controlled.

Read the paper here: Generating Multiple Objects at Spatially Distinct Locations

3. [Contributed Talk] A Unifying Framework for Early Visual Representations from Retina to Cortex Through Anatomically Constrained CNNs

Why we like it: this paper brings together cutting-edge research from neuroscience and computer-vision

It’s fascinating when ML can learn from other disciplines. This talk described the work done at the intersection of neuroscience and computer vision ML by Jack Lindsey and others at Stanford on what we can learn from mammalian visual systems. The mammalian visual system is organized to process visual information in stages. The first two are the retina and the visual cortex. The representations at these two stages are notably different. This talk explored whether we can learn anything from this neurological structure and replicate it in ML.

When using traditional Convolutional Neural Nets (CNNs) as a model for visual representations, we create a model which learns representations in a way which is similar to the cortex alone, but we skip past the center-surround receptive field structures that we observe at the output of retinas in real life. The simple image classification model we see in the image below shows this mismatch in what we expect at the output of the retina stage.

The reason that mammals encode information in the way that they do is largely to do with anatomical constraints, caused by the presence of the optic nerve. For example, humans have 120M photoreceptors in the retina which lead to just 1M output neurons in the optic nerve received by 140M neurons in the cortex. The authors recreated these constraints to see whether this generated the expected center-surround representations, which it did.

This constraint differs across species. This is interesting because it suggests that visual systems are optimized for different results, which is determined by the relationship between the retinal output, optic nerve and complexity of the cortex. “Small vertebrates perform sophisticated nonlinear computations, extracting features directly relevant to behavior, whereas retinas of large animals such as primates should mostly encode the visual scene linearly and respond to a much broader range of stimuli.”

The paper compares a mouse and a macaque. When the cortex is more complex, in a macaque, the retina is more linear, whereas, for a mouse, where the cortex is less complex, it focuses on useful feature extraction. “[This] reconciles the two seemingly incompatible views of the retina as either performing feature extraction or efficient coding of natural scenes, by suggesting that all vertebrates lie on a spectrum between these two objectives, depending on the degree of neural resources allocated to their visual system.” An understanding of this spectrum should allow us to optimize for either feature extraction or generalization in computer vision, depending on our objectives.

We talked to the author at the poster session about downstream applications of this research — obviously this constrained model doesn’t beat the other model with more parameters, because it has fewer parameters, but the author expects this kind of work to generalize better to unseen things just like a monkey would generalize better than a mouse. Experiments are underway.

You can find the paper here: A Unifying Framework for Early Visual Representations from Retina to Cortex Through Anatomically Constrained CNNs.

4. [Invited Talk] Learning Natural Language Interfaces with Neural Models

Why we like it: We believe natural language interfaces are going to be important over the coming decade. Take a read through our Principles of Conversations AI to get an idea of how we think about this at Georgian Partners. This talk by Mirella Lapata presented some challenges faced by these interfaces along with some elegant ideas to address them.

Natural language interfaces allow users to interact with computers using human language. They work by parsing human language into machine usable language. This is the same process Google uses in its search engine, and Alexa, Siri and Google Home too, with an added voice to text step.

This talk proposed some interesting solutions to three considerable challenges:

  1. Structural mismatches between human and machine language can be tackled by finding ways for machines to map natural language expressions to their logical forms using modified neural encoder-decoder architectures;

By using seq2tree instead of seq2seq we can mimic hierarchical structures of human language which are ignored by traditional seq2seq. Seq2tree, a variant of the LSTM that takes a sequence and outputs a tree, provides a way to use the known linguistic structure in sequence generation problems. It works by generating branches until an end node is hit.

Coarse to fine decoding works by adding an intermediary ‘meaning sketch’ phase when moving from natural language to low-level details. This meaning sketch allows us to disentangle high-level from low-level semantics and explicitly share coarse structures for examples with the same basic meaning. This approach provides global context to fine meaning decoding by sharing coarse structure across similar examples.

One possible route to cope with unseen variations of a given language is to generate paraphrases using Neural Machine Translation to go from the target language into another language and back into the target language. For example from English to French and back into English. Architectures can then be extended to use these new examples in tandem with the original phrase.

Code and data for this talk are available here: Learning Natural Language Interfaces with Neural Models.

5. [Invited Talk] Can Machine Learning Help to Conduct a Planetary Health Check?

Why we like it: This presentation highlighted real challenges faced by our planet and discusses different ways that ML can help.

It’s clear that we need actionable information on climate risk. With such vast datasets available to us from sensors, simulations, satellites, climate records and weather stations there should be many ways to employ recent advances in machine learning and data science to meet this end. However, as this talk highlighted, there are challenges specific to ML in climate science when compared to traditional ML.

Specifically, these are:

  • High uncertainty and signal to noise ratio (SNR)

So what’s the best approach to using ML in climate science? This talk encouraged us to think of it in terms of conducting a health check:

  1. Monitoring the patient
  • We need to synthesize and interpolate different datasets to allow for interrogation by users and ingestion of new data in near real-time.

2. Treating the symptoms

  • We should create a standardized methodology for climate risk assessment which will lead to evidence-based decision making.

3. Curing the disease

  • Through a blend of data-driven and physics-based process, we can improve understanding of the key processes and improve future projections of climate change to inform policymaking.

Climate change is the defining issue of our time, we should rise to the data science challenge and leverage large amounts of data to process and generate information to help us assess the planet’s climate health. The presenter of this paper, Emily Shuckburgh, urged data scientists to generate benchmarks for climate science similar to ImageNet to help tackle these pressing issues.

6. BONUS [Poster] Unsupervised Recommendation and Discovery of an Education Marketplace

Why we like it: this paper provides a good example for companies designing recommendation systems with limited or no data.

I also presented this poster of my work with our portfolio company Top Hat at the AI for Social Good workshop. Top Hat sells classroom engagement software and electronic textbooks to colleges and universities. The company wanted to improve active learning opportunities for students by providing additional practice questions for study. The project was focused on helping professors create questions specific to their students and class, but also on helping students with the concept of mastery learning. So if students tend to get a certain kind of question incorrect over and over again, this system could be used to recommend with similar questions so that they can improve through practice.

The solution leverages historic data as a proxy label to tune unsupervised recommendation models. We ensured the validity of the models by doing expert A/B testing to see whether the recommendation engine selected questions that we deemed to be as useful as those chosen by an expert.

While this approach has seen some early success, the plan is to start collecting data for more advanced approaches like collaborative filtering. The method discussed in the paper is a good example for companies designing recommendation systems with limited or no data.

Read the paper here: Unsupervised Recommendation and Discovery of an Education Marketplace

ICLR and Diversity

The papers we selected for this post do not do justice to the breadth and depth of thinking on display at ICLR. For more highlights, we recommend that you read Chip Huyen’s Top 8 Trends from ICLR 2019 and David Abel’s https://david-abel.github.io/notes/iclr_2019.pdf very detailed summary (56 pages!) of the conference. All posters for ICLR 2019 can be found here and all proceedings of the conference can be found here. ICLR also had 9 workshops running in parallel — slides and videos for those are available here.

To wrap up, we think it’s important to salute ICLR’s work in proactively improving the diversity of this conference. ICLR is a fairly new conference and has the energy and opportunity to tackle complex problems like diversity. One of Georgian’s investment thesis areas is trust, and a core part of building trust involves identifying and managing bias. We believe that organizations that address bias not only in their data but also at a human level will prosper in the long run. At ICLR, the first two keynote talks, Sasha Rush’s opening remarks and Cynthia Dwork’s invited talk, focused on fairness and equality. Each workshop had at least two female organizers, area chair representations went up from 13% to 25% this year and there was 100% gender parity in invited speakers and program chairs. Of course, there is still work to be done, but these statistics show that the focus is paying off. Next year ICLR will be held in Addis, Ethiopia in a push to improve ML involvement in Africa.

Georgian Impact Blog

A blog focused on machine learning and artificial…