WiEECS at NeurIPS 2018

Rose Wang
9 min readJan 23, 2019

--

NeurIPS Logo

Happy holidays and happy new year from Women in EECS (WiEECS)! Last year was an exciting year for WiEECS for plenty of reasons: our Publicity, Community Events, Mentorship, and Professional Development committees did amazing jobs at creating amazing materials, growing our mentorship base, and hosting study break events for the entire community.

One of the newer initiatives we also started last year (2018) was the WiEECS conference program funded generously by the EECS Department and IEEE. The conference program aimed to expose students to groundbreaking research projects from around the world. This year, our team organized grants to NeurIPS, the Neural Information Processing Systems conference. Here are what some of our participants had to say about their experience at the conference!

Why did you want to attend NeurIPS?

Hi! I’m Baula, currently a senior at MIT studying Electrical Engineering and Computer Science. I am currently a director on the mentorship committee for WiEECS and am also highly involved in MIT SWE. I love attending conferences because it gives me a chance to learn about new research and technologies, meet interesting people, and get out of the MIT bubble. The number of MIT professors giving talks and presenting at NeurIPS also reminded me of how lucky I am to attend a university where so much groundbreaking research is being conducted. NeurIPS was especially interesting to me because I had taken the Human Intelligence Enterprise the semester before, where we read a number of papers relating to artificial intelligence. Some of the ideas and methods we read about were crazy for their time, and NeurIPS provided a great opportunity to satisfy my curiosity about new advances in artificial intelligence research.

My name is Jingwei. I’m a junior studying computer science and working on a computer vision research project. I wanted to attend NeurIPS for a few different reasons: see what a conference look like, learn the names of important people, get exposure to the newest research in different machine learning subfields and hope to have a slightly better understanding of how machine learning works at an ML conference.

The last two reasons are probably more important. The second last reason, I believe that to do good research and to get closer to solving a research problem, one needs to keep learning the newest methods and get exposure to fields outside of their own research problem. And the last reason, when doing research, I found it very difficult to debug a model or explain why it failed for certain cases. At NeurIPS, I wanted to see people’s work on understanding the machine learning black box.

Hi, my name is Rose! I’m a junior at MIT studying computer science and mathematics. I’m currently working on a multi-agent reinforcement learning project assuming limited communication in a cooperative setting. Given the growing popularity of reinforcement learning, NeurIPS seemed like an amazing platform to directly talk with researchers who are in the same domain and ask them about potentially shared issues. I also wanted to gain exposure to problem statements from other domains.

Going in, what did you hope to learn from the conference? What did you learn from the conference?

Before the conference, I had made a list of the talks that sounded interesting to me, even though they were from a variety of different workshops. I also tried to attend talks from MIT faculty after I realized how little I knew about even my own professors’ research. While I learned a lot about new machine learning techniques and their applications, one talk that stood out to me was on “sociotechnical security” given by Danah Boyd. This talk dug deep into how our current methods of machine learning may allow for unintended consequences. For example, search engines rely on high quality data to achieve good results. When a search engine has nothing to learn from except sites like Reddit and Twitter, where facts get lost and convoluted extremely quickly, adversaries have learned to premake data environments for search terms that convey incorrect information. With the constant changes in society that progress more quickly each day, terms cycle through quickly. These terms that are no longer popular are known as “left behind terms”. However, adversaries can reappropriate these “left behind terms” and we are none the wiser. In research, we usually assume people are well intended and helpful. However, this is not necessarily true. It’s important to take a moment to think about the consequences of the technology we create and how we can adapt in the age of media rather than just hoping problems will disappear.

Going in, I thought it was more valuable to focus on one topic, better if something completely new to me. I chose meta-learning because the description said that meta-learning methods could “generate new learning methods from scratch”. The whole “learning to learn” concept sounded abstract but interesting to me so I decided to explore more.

Here I want to mention some other talks I really enjoyed.

Good & Bad Assumptions in Model Design and Interpretability by Jason Yosinski. Yosinski talked about his research on visualizing image classification. The visualization tool was called the deepvis toolbox. It highlights parts of an image that helped the network make a classification. He also talked about his research on training networks with images in jpeg representation instead of images in pixels.

Learning from the Move to Neural Machine Translation at Google by Mike Schuster. The talk was about how Google moved from phrased-based(sequence to sequence model, attention model) to neural-based translation(end to end). They call the neural based system GNMT (Google Neural Machine Translation). Asian languages translation quality improved the most. The GNMT paper is worth reading.

Learning to Drive in a Day by wayve.ai. They trained a model for lane following from scratch using deep reinforcement learning. There is a safety driver in the car correcting with steering wheel if the car is going off the lane. The reward is the distance travelled by the vehicle without the safety driver taking control. After just 15–20 minutes the car learned how to follow a lane. The task itself was not too impressive, but the RL approach working so well and the input being just driver feedback and monocular images was very impressive. Paper, blog post

The conference provided me rich exposure to branches of reinforcement learning (RL) ranging from the theory to the practice. Since there isn’t an RL-focused class offered (yet) at MIT, I thought I’d gain more from sampling different tracks that introduced a different side of the larger area.

I found the following two talks to be very interesting:

Waymo’s multi-agent system for modelling, perceiving and planning: Centralized on the tradeoff between biases in the system components. Put into the context of the DAgger problem (S. Ross) and autonomous driving, how can we create a system that is able to generalize behavior, and not simply mimic it? Waymo recently released this paper on ChauffeurNet, which covers most of the points that were mentioned in the talk. I found this talk especially interesting because of the involvement of other (heterogenous) agents affecting the behavior and decision-planning of an agent in a high-stakes context.

Deepmind’s MetaGradient Reinforcement Learning: Online adaptive changes to “the nature of the return” — A learning approach to characterizing the learned value function. Somewhat a meta-take on how agents learn a behaviorally correct policy. David Silver spoke about this work and discussed comparisons of DQN vs. a metagradient DQN, and mentioned the latter performed exceedingly better. This talk made me think a lot more about what we take for granted in these learning algorithms: the parameter tuning and the different representations sets of parameters might induce.

What was your favorite workshop and why?

My favorite workshop was on Machine Learning for Creativity and Design. This semester, I took a class on Theater Design, and I love to learn about ways to combine technology and art. The topics in this workshop ranged from creating jewelry to playing music to generating Chinese opera. One talk in the workshop I particularly enjoyed was that of Allison Parrish. Allison is a creator of computer aided poetry, and publishes books of her work. In her talk, Allison explained how she would try to replicate work similar to that of 80 Flowers by Louis Zukovsky. However, a large problem with art is that it’s difficult to say if something is “good” or not. She found that when listing out properties of “good” poems, not many famous poems would be classified as “good” under the same standards. Instead, she tried to focus on capturing the feelings of Zukovsky’s poems, beautiful and smooth to read. Many of Zukovsky’s poems contain made up words in order to achieve a certain sound when read aloud. Allison attempted to replicate this by compressing words into a list of vectors and blending these so called “word vectors”. As a result, she was able to find words in between others that allowed for a smooth transition from one word to another. For example, between “w” and “x” there would be the words “acceptable”, “delectable”, “extendable”, and “adaptec’s”. When the words are read aloud, the transition sounds pleasing to the ear.

Meta-learning was my favorite workshop. There were several very interesting ideas under this topic: using a network to optimize another network (optimizer learning), learning with few examples (few-shot learning, most work in image classification), and training one model to perform multiple tasks (versatility).

Here are some key takeaways from the talks I went to:

What’s Wrong with Meta-learning by Sergey Levine.

Doing image classification with supervised learning, the input is an image and the output is a label. Given a large dataset with a lot of classes and plenty examples for each class, the model can predict the class for a test image.

Using supervised meta-learning, the input is a few-shot dataset(n-shot, k-way) and a test image, and the output is that image’s label. Given many examples of how to classify an image in k categories(n examples for each category), at test time the model can do a similar classification with unseen data. To read in a few-shot dataset and the image, we can use RNN. It kind of converges, but we don’t know what it converges to and if it’s good enough.

MAML(model agnostic meta-learning) gives a good model initialization so that learning takes a few gradient descent steps and the model doesn’t overfit. The best way to understand this is to go over the algorithm and loss function in the MAML paper.

The performance of meta-learning methods depends of the tasks available for training. If we can propose tasks automatically, the meta-learning method is automated. To learn more about Random Task Proposal, this paper is very useful. And for unsupervised meta-learning, this is a good paper.

Their next research: probabilistic meta-learning, deep online learning via meta-learning, meta-learning language-guided policy learning, one-shot imitation meta-learning

Tools that Learn by Nando de Freitas.

Learn to experiment:

In a simulated environment, an agent needs to experiment to find the heaviest block. [link to paper]

Learn to optimize:

Replace a hand-designed gradient descent update-rule with a learned update rule. [link to paper]

Automatically design neural net architecture. [link to paper]

Learn to program:

The Neural Programmer-Interpreters can use a set of low level programs to assemble a higher level program that solves a problem. For example, it can come up with a sequence of commands(selected from a given set, e.g. {ADD, CARRY, LSHIFT}) for adding large numbers. The core is an LSTM model that inputs a program embedding, arguments for that program, a feature representation for the environment, and outputs the next program and the arguments for the next program. [link to paper]

Learn to imitate:

Few-shot text-to-speech: Few-shot WaveNet and WaveRNN achieve the same sample quality (with 5 minutes) as the model trained from scratch with 4 hours of data. [link to paper]

One-shot imitation learning: Given a single demonstration(stacking blocks), a robot can learn to perform the same task with an arbitrary initial setup. [link to paper]

In case you’d like to view other conference materials, the conference organization team has posted this wonderful resource with videos and slides of the talks.

If you’d like to get involved with Women in EECS, feel free to contact us at wieecs-president[at]mit[dot]edu.

--

--