The revolution will be unsupervised and other takeaways from the RE•WORK Deep Learning Summit

CBC Digital Labs

Follow

Published in

CBC Digital Labs

4 min readOct 31, 2017

--

Networks connecting to networks connecting to networks — one aspect of machine learning; image via Unsplash

A few weeks ago, six members of CBC’s Digital Products team attended a Deep Learning Summit in Montreal.

(For those new to deep learning, it’s a machine learning technique used to train very narrow models for purposes of making a prediction.)

Here are 11 takeaways.

1. The revolution will be unsupervised

Yann LeCun believes the next breakthroughs in deep learning will result from utilizing unsupervised learning methods.

“A key element we are missing is predictive (or unsupervised) learning: the ability of a machine to model the environment, predict possible futures and understand how the world works by observing it and acting in it.” — Yann LeCun

2. Visual Reasoning via feature-wise Linear Modulation

Aaron Courville says that it’s worthwhile to sweat the architecture of your model. When at first their model didn’t work, they used a specific method of Feature-wise Linear Modulation. Then, it did.

Yoshua Bengio explores how learning models can cheat and uses the example of a dog and an ostrich; Photos by Justin Veenema and Justin Veenema

3. Machine learning models can “cheat” by picking up on surface regularities

Yoshua Bengio gave the example of, in image recognition, a machine correctly categorizing a dog, but only until a few specific pixels were changed — after that, the model categorized it as an ostrich.

Why? Because “deep understanding” and “abstract representations” are still missing.

We’ll be checking out his paper, Consciousness Prior. Bengio is continuing to explore how to build high-level cognitive abilities out of low-level, “grounded” neural networks.

4. Meta-learning: generalize from few examples

Hugo Laroche and a few others discussed “one-shot” learning. This is something humans do quite easily. For instance we learn new words or concepts after encountering them a few times, sometimes even just once.

Hugo has developed an approach for machine learning requiring a small number of examples — as few as five for a given category — allowing a network to “learn how to learn,” or meta-learn.

Given that access to large datasets can be prohibitive in machine learning, this work can help to increase the effectiveness of learning routines.

5. How machines could learn as efficiently as animals and humans

Yann Lecun suggested Entity Recurrent Neural Networks, or EntNets, are useful for keeping track of the states in a conversation.

We wonder, could this be used as representation of user profile/history and be integrated into another neural network, say recommendation?

6. Estimate lighting from photographs

Jean-François Lalonde mentioned an increasingly precise loss function in machine learning. In the case of estimating lighting condition, he suggests we start with calculating loss on low frequency light, and get more precise (higher light frequency) as more epoch.

The kind of problem he and his team were having can be shown using the following slide from his presentation, where the blue-ish images are their prediction of the light source. The prediction partially overfits towards the right, but miss out the small light source towards the left.

Loss function in image recognition; images from Jean-François Lalonde, added here with his permission

The solution can be implemented during the beginning of training: calculate loss against a blurry Y (low light frequency). Then, as more epochs come in, it calculates against a more accurate Y.

7. When doing deep learning on DNA, use CNNs over RNNs

Jasper Snoek showed how the use of convolutional neural networks (CNN) is better than using recurrent neural networks (RNN) when doing deep learning on DNA. He says they work better with long sequences of data.

8. Group, Focus, Forget, Repeat

Kyunghyun Cho sees neural networks as cycling through three phases:

Grouping clusters
Focusing in on one cluster
Forgetting the rest

9. Interest in dynamical architecture?

Joan Bruna talked about a type of software design using machine learning that then dynamically splits and merges inputs in order to better perform tasks like sorting. This idea coheres with Hinton’s dynamic routing concept.

We are now asking, is the industry getting more interest in dynamical architecture?

10. Consult a thesaurus

Hanlin Tang reminded attendees that, in natural language processing, thesaurus replacement of sentences performs quite well.

11. At this point, conversational dialogue is more about questions than answers

Alex Acero said that while deep learning in speech recognition has come a long way, we’re far from true conversational dialogue. The people working on semantic analysis of words need to work more closely with the people working on understanding the sounds used to make those words.

If you went to the conference, what were your takeaways?