NeurIPS 2018 Highlights (Part 1)

Topics in this post include conversational AI; autonomous driving; ML for health and creativity; fairness and bias in ML,…

Published in

DAIR.AI

15 min readDec 20, 2018

The Neural Information Processing Systems (NeurIPS) 2018 conference, hosted in Montreal, Canada, was host to a number of works ranging from machine learning for creativity and health to improvements in autonomous driving systems. In this post, we will expound on some of the recurrent themes at the conference, keeping in mind that the topics were carefully selected from among a large body of excellent and diverse presentations. Please note that this overview only covers material from the invited talks, the main conference, and workshops. After this read, we hope the readers will gain an intuitive understanding of some of the most important highlights of the conference which they can further pursue with the references provided. Specifically, we will be focusing on the following topics:

Conversational AI
Machine learning for creativity and health
Bias and Fairness in ML
Autonomous driving
Advances in NLP
Miscellaneous topics
Diversity in AI

Conversational AI

Conversational artificial intelligence (Conversational AI) is a research area that deals with neural based approaches used to build conversational systems such as task-oriented dialogue systems and question answering. Conversational AI is an important area of research because it encompasses the majority of sub-disciplines in natural language processing (NLP). A common thought and conversation among participants and presenters at the conference was the use of visual information to assist in the development of smarter and more context-aware conversational AI or NLP systems in general. The overall consensus is that learning from text alone may not be enough for a conversational agent to engage in a natural conversation. The hope is that other modalities such as images may be useful to extract important information to help systems understand context and engage in more natural conversations.

In the same workshop, Ruhi Sarikaya (Director of Applied Science at Amazon), discussed a mechanism to remove natural language interaction frictions using contextual information. Context information, he claims, could be defined by facts about a particular event or entity that involves senses such as see, hear, feel, etc. This capability was recently embedded into Amazon conversational agent, Alexa. With these new capabilities, Alexa now has the ability to naturally interact in a conversation without the user having to refer to it by its name everytime they want to follow up on a previous conversation, dialog, or question. One of the other interesting features presented was “Contextual Self Learning and Self Healing”: given different conversational scenarios, there is a query rewrite engine that helps the conversational agent to better understand human requests. For instance, if a human asked “Alexa, play buddha”, then the query rewrite engine would convert the statement to a machine-readable instruction such as “play boo’d up”. The key models used were Seq2Seq DNN and Absorbing Markov Chain models. You can read more about these advances directly from Alexa Blogs.

Yi-Chia Wang spoke about how Uber is embedding social capabilities into their conversational agents and how it has affected their driver’s engagement rate. They discovered that more natural conversations make a difference in how often drivers engage with their conversational bots. One of the interesting parts of the study was how language style transfer was used to generate responses with social language. Read more about their work here.

To find out more interesting posters and presentations from the 2nd Conversational AI Workshop you can visit the website here.

Machine Learning for Creativity and Health

One of the more fun places to hang at the conference was at the workshop on “Machine Learning for Creativity and Design” because of the incredible range of creative works that were presented. There was “Piano Genie”, an intelligent musical interface which allows you to improvise on the piano via an intuitive controller. Holly Grim presented on a CycleGAN used to generate art via the influence of art composition attributes (through domain knowledge). DaDA is a generative approach to convert sketches into Chinese Shanshui style painting. Pablo Samuel Castro presented an improved lyric generation method based on combined learned lyrical structures and vocabulary. Tarin Clanuwat and others presented a deep learning approach to understanding classical Japanese literature (they even introduced a dataset for Japanese literature called Kuzushiji-MNIST). You can check out more of the works presented at the workshop at their online gallery.

The “Machine Learning for Health” workshop hosted works that aimed to leverage deep learning and machine learning techniques to address problems in the health domain. Dr. Fei Fei and others presented their work on detecting the severity of depression using multimodal information. To the best of our knowledge, there is no other work that has combined facial and voice features to detect the severity of depression in humans. Most of the previous works in depression detection have been done using single modalities such as text or audio. This is a major step towards a more unified model and building real-world applications for the health sector. And this work also helps to emphasize the importance of combining modalities to built more expressive and accurate decision-making and machine learning systems.

This year saw a rise of deep learning methods used for radiology and other tasks that involve medical imaging. This is an important research space since medical imaging involves different challenges and problems as compared to traditional computer vision tasks. As an example, there was an interesting poster that used graph convolutional networks for radiotherapy target countering. This was work is under submission but you can see the poster below, and if you are interested, you can also contact the authors directly.

Another work presented a clever way to use attention mechanism to predict gestational age of the fetal brain. (See poster below)

You can find the full list of posters and talks presented at the workshop here.

Bias and Fairness in ML

In the Latinx in AI keynote, Omar Florez discussed whether an AI algorithm can be biased. His preliminary findings indicate that there is no possible way machine learning algorithms can be biased. He further claims that this is usually introduced in different parts of the AI pipeline that all involve human decision making such as gathering and labeling data. The fact is that AI algorithms are almost never biased as also clarified and elaborated in the “Improving Fairness in Machine Learning Systems” talk by Hannah Wallach (Microsoft Researcher). Hannah further makes the observation that when we use the word “algorithm” to refer to AI systems it tends to be misused by the media. So she poses the question of whether we should be communicating about our systems using a different vocabulary. She suggested that we start to use words such as “models” to refer to these AI systems.

Rich Caruana discussed the risks involved with using Black-Box models in healthcare and criminal justice. David Spiegelhalter, in his invited talk, also spoke about the benefits of statistical science when building machine learning algorithms, and how it can contribute to transparency, explanation, and validation. Here is a nice episode of him discussing more on the topic of what it means to be “trusted” and being “trustworthy”. Another great invited talk was delivered by Edward W. Felten on the topic of machine learning and public policy. The whole point of the talk was to encourage machine learning researchers to be more active and engage in public policy and other public duties and discussions. Jon Kleinberg digs deep into what it means to build machine learning classifiers to be fair to different groups. Roel Dobbe delivered a talk on the importance of improving fairness metrics and the idea of introducing diagnostic tools for identifying limitations, value, and challenges of integrating algorithms in real-world settings.

If you want to more about the other talks given on the topic of bias and fairness in ML, head over to the “Workshop on Ethical, Social and Governance Issues in AI” webpage.

Autonomous driving

Autonomous driving has become one of the leading applications in driving progress in AI. As such, this topic had a central role in the conference with two dedicated workshops. The first one, the MLAuto Workshop, was hosted by Pony.AI and focused on recent advances and research opportunities for autonomous driving. The second, “Machine Learning for Intelligent Transportation Systems (MLITS) Workshops”, had a much broader scope and focused on addressing the challenges arising in our future transportation systems. Apart from autonomous vehicles, it also covered vehicle-to-vehicle (V2V) and vehicle-to-everything (V2X) communication infrastructures, and smart road infrastructures like smart traffic lights. I had the opportunity to attend the latter, and below I’m sharing my highlights of the session.

Although dramatic progress has been made in this field, there are still many great challenges in achieving full-autonomy. For example, how do we make perception robust and accurate to ensure safe driving? How do we learn policies that equip vehicles with adaptive human negotiation skills when merging, overtaking, or yielding? How do we decide when a system is safe enough to be deployed onto real roads?

The workshop provided an opportunity to learn how industry players approach tackling some of these challenges. Below is a summary of the most interesting talks and papers discussed during the workshop:

Alfredo Canzani (NYU) kicked off the session presenting his work on “Prediction & Planning Under Uncertainty”. He discussed the importance of accurate predictions of the environment, stochastic by nature, for the path planning phase on autonomous vehicles.
Yimeng Zhang (Pony.AI) discussed the challenges of generalization, especially as most autonomous driving companies are spending their efforts on collecting data and testing on 1 or 2 cities only. She also covered some interesting problems their team encountered when testing Pony.AI system in different countries.
Nathaniel Fairfield (Waymo) gave a sneak peek of Waymo’s latest approach to self-driving by imitation learning. His team created ChaufferNet, a deep recurrent neural network (RNN) trained to emit a driving trajectory by observing real-world expert demonstrations. In their research, they discovered that standard behavioral cloning (an Imitation Learning technique) was insufficient for handling complex driving scenarios (i.e.: traffic lights, stop signs, nudging around a parked car), despite using 30 million examples collected. For example, they often observed that the (simulated) car would collide with other vehicles or get stuck. To tackle this, they improved model performance with “imitation dropout”, by exposing the learner to additional behaviors such as off-road driving and collisions. Rather than purely imitating all data, they augmented the imitation loss with additional losses that penalized undesirable events and encouraged progress, leading to the robustness of the learned model.

John Leonard (MIT & TRI) introduced his work on Toyota Guardian system. The objective of Guardian is to create a highly automated driving system that can act as a safety net for the human driver to help prevent an accident. He also presented a recent paper from his team at TRI called SuperDepth: Self-Supervised, Super-Resolved Monocular Depth Estimation.

Dorsa Sadigh (Stanford) presented a very interesting study in which she addressed the challenges of mixed-autonomy traffic networks (where autonomous vehicles share the road with human-driven cars) via leveraging the power of autonomous vehicles to positively influence congestion. She introduced the concept of “altruistic autonomy” in which autonomous vehicles are incentivized to take less efficient routes to alleviate congestion.

The workshop ended up with a great panel on the key challenges and approaches of AI for Autonomous Driving. Some of the most common challenges discussed involved how to deal with uncertainty, and also generalizing to unseen situations and environments, especially as autonomous vehicle companies look to expand internationally.

Alfredo Canziani (NYU) proposed the use of latent variables to enable some sort of internal switch that will let you tune the model to different aspects of the environment. Marco Pavone (Stanford) highlighted the differences in driving behaviors and norms from country to country and expressed the need for companies to spend efforts on collecting local data for every city they plan to deploy.
Similarly, Sarah Tariq (Zoox) mentioned that one approach they are using at Zoox is to collect data from one city, and then recreate it virtually in a simulator. This way they can test a system without the need to deploy in every city.
Ekaterina Taralova, also from Zoox, added that what’s really important is not to drive millions of miles but to drive the right kind of miles, which makes simulation critical to virtually reproduce the less common situations. We can assume that the probability distribution of possible scenarios that can occur during driving follow a Gaussian distribution with a very long tail with infinite uncommon situations.
Finally, Prof Kurt Keutzer (UC Berkeley) warned of the risk of letting the self-driving market developed on its own, with little governmental intervention, as this could lead to perpetuating inequality, having companies deploying their vehicles in areas that make more economic sense for self-driving cars to operate, leaving places where its harder or there’s less population density left out/marginalized.

See here for all the papers accepted for the MLITS Workshop 2018.

Advances in NLP

There were various interesting works in natural language processing (NLP) presented throughout the conference. In the first NLP session, Yin and Shen discussed a theoretically motivated framework to understand and optimize word embedding dimensionality. Another work sought to target learning a cross-modal alignment between speech and text embedding spaces in an unsupervised manner. They claim that their method is useful for cases where there is little parallel audio-text data for training modern supervised automatic speech recognition (ASR) systems. Other interesting NLP works and posters presented at the conference are listed below:

Learn to Reason with Third-Order Tensor Products — Aims to replace the hidden state of an RNN with a tensor product representation to achieve SOTA on the bAbi question answering dataset, while offering reasoning capabilities.
Relational Recurrent Neural Networks — DeepMind researchers propose a new memory module, called Relational Memory Core, which has the capability to perform complex relational reasoning by enabling memories to interact. They test on language modeling and program evaluation tasks.
Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models — This work proposes “Fast Graph Decoder”, which aims to speed up the decoding process of neural language models at the softmax layer, which is especially useful in cases where there is a large vocabulary to process.
Towards Text Generation with Adversarially Learned Neural Outlines — A method that generates text (in the form of outlines) using a combination of autoregressive and adversarial models while leveraging pre-trained sentence representations (obtained through sentence decoder).
Middle-Out Decoder — A method that uses self-attention mechanism to generate sequences from the middle-out rather than the common left-to-right strategy. The authors are able to improve caption quality and also have more control of the generation process of the decoder.
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis — Generate speech audio from text in the voice of many different speakers, even from speakers that were not seen during training.
Turbo Learning for CaptionBot and DrawingBot — Shows how to jointly and efficiently train an image-to-text generator and text-to-image generator. The intuition is that by training both models together, it is possible that they share feedback with each other. One nice effect with this method is that it claims to enable semi-supervised learning by providing pseudo-labels to unlabeled examples.
Partially-Supervised Image Captioning — Achieve state of the art results on image object captioning (COCO dataset) by learning new visual concepts from object detection datasets and labeled images. This could be more applicable in real-world settings and remove the domain restriction.
Answerer in Questioner’s Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog — This work proposes “Answerer in Questioner’s Mind” an information theoretic algorithm to perform effective goal-oriented visual dialog by training the questioner to learn in a probabilistic way the intentions of the answerer. The method adopts ideas from the theory of mind.
Learning Conditioned Graph Structures for Interpretable Visual Question Answering — This method adopts graph convolutions to learn a question specific graph representations from input images. The idea is that semantic and spatial relationships can be used to efficiently capture question specific interactions from images.
TIFTI: A Framework for Extracting Drug Intervals from Longitudinal Clinic Notes — This work aims to use a rule-based approach to automatically extract drug regimes for oral cancer drugs from clinical notes.

Miscellaneous

Fernanda Viegas (Google) and Martin Wattenberg (Google) gave a tutorial on best practices for visualizations in machine learning.
The invited talk entitled “Machine Learning Meets: What to Expect and How to Cope” delivered by Edward W. Felten posits that in order to have better laws governing our use of technology it is important to engage in constructive conversations with policymakers so that it could have broader positive impact for the field, government, and society.
One of the more interesting and fun presentations at the conference was the talk delivered by Professor Jason Eisner on what we can learn from deep learning methods for linguistics, and how that knowledge can be leveraged to improve traditional methods as he has done in his popular and groundbreaking work on bi-LSTM Finite State Transducers.

At the “Visually Grounded Interaction and Language” workshop, Angeliki Lazariduo (DeepMind Research Scientist) presented her work on using virtual environments as the basis for language learning. Some of the questions answered in the presentation were whether input modality affects the compositionality of emergent languages in agents, and why we should even care about compositionality and how do we measure it.
One of the important takeaways from the conference was the following: Most of the innovation we see today, such as in the field of deep learning, are just minor improvements or clever applications of older algorithms. In order to advance the field, it is important to also be a contrarian and not always aim at beating the state of the art (SOTA) results just because we can. It is also important that we aim to obtain a deeper understanding of phenomena and the problems you are working on.
From my observations, one of the less discussed topics in the entire conference was causality and the role it could play in building more effective and accurate AI systems. It is not entirely clear how causality can be bridged with deep learning but as we continue to build more autonomous systems that interact with unpredictable environments, it will be crucial to move away from the common supervised methods that only rely on strong associations found in data.
Zachary Lipton continues to voice his concerns on some of the troubling trends in machine learning. Lipton’s concerns are mostly on the topic of leader-board chasing and other troubling trends such as the use of anthropomorphic language to explain the capabilities of AI systems. If you have read all the way here, you may have noticed that this a problematic theme in ML, which has raised awareness in the community on what not to do when publishing and communicating about ML algorithms.
Hal Pham and colleagues presented their work on “Learning Robust Joint Representations for Multimodal Sentiment Analysis”, in which they jointly train a sentiment classifier using multi-modalities which only relied on the language modality at test time.

Diversity in AI

We left this category for last, not because it was the least important, but because of how crucial it is for the progress of AI going forward. This year, NeurIPS hosted a wide range of workshops, which included different under-presented groups in AI, in an effort to increase diversity. Each group hosted their own workshops, which showcased the incredible talent pool and research that exists in those communities. To summarize all the great and amazing works that were presented by participants in these workshops we would need more than a blog post, so we decided to include links to some of those workshops below and will highlight some of these works in the future (readers can access the complete programs below):

Other Useful Resources

Full List of accepted papers for NeurIPS 2018
List of NeurIPS 2018 accepted papers in NLP
NeurIPS recorded talks
A summary of Continual Learning workshop
Poster sessions summaries
Videos for causal learning
Women in Machine Learning workshop at NeurIPS
Invited talks at NeurIPS (videos)
An overview of ML for creativity
2nd Conversational AI Workshop
NLP Session 1
AI for Social Good
Machine learning for health
Intel AI NeurIPS roundup
Workshop on Ethical, Social and Governance Issues in AI
Interpretability and Robustness in Audio, Speech, and Language
Meta-Learning Workshop

Special thanks to Ignacio López-Francos for his major contributions to this article (particularly on the autonomous driving portion). This article couldn’t have been possible without him. We are already working on the second part of NeurIPS 2018 Highlights which will include more detailed overviews of other workshops such as deep reinforcement learning, Bayesian deep learning, among others. If you have tips, suggestions, or references, please reach out to either Elvis Saravia or Ignacio López-Francos. Thanks for reading and enjoy the holidays!