On Deep Learning

A Tweeted Bibliography

Here’s a collection of my tweets on interesting/exciting developments in Deep Learning or Machine Learning in general.

It’s in no grand order, but does serve as a convenient reference & provides some context.

For every student of Stochastic Gradient Descent
The definitive comparison of Discriminative (p(y|x) directly) vs Generative (p(y|x) from p(x,y)) learners
One insight into why the “Deep” in “Deep Learning”
Related to number of layers, what 3-layer Universal Function Approximation means, also what it *doesn’t* mean
One of my ah-ha moments was in watching this generative machine *imagine* the concept “8”
A broad survey of Object Recognition c. end of 2014
Although Distributed Representations have been studied a long time, AFAIK this is one of the first works to show conceptually meaningful, second-order relationships among those representations
The seminal paper by Yoshua Bengio et al. on Distributed Representation
Do you know where Back Prop came from?
Microsoft has actually been looking at FPGA’s for quite some time, not just for Deep Learning, but for general distributed computation
A Parametric Rectifier unit introduced by Microsoft significantly improves training convergence. This also obsoletes the Object-Recognition Survey paper published just 2 months earlier.
Microsoft team shows how we can “cheat” and get a shallow network to perform just as well as a deep network. Note that training on “normal” data sizes still requires a deep network; that’s the crux of the matter.
How did we so quickly get to the point where this is no longer exciting?
Andrew Ng and his team at Baidu have access to lots of compute and even more data, and that’s going to set the pace of innovation going forward
But Deep Learning is also being democratized by accessibility of massive computational power. Good human ideas are still needed for significant innovation.
Input normalization, parameter initialization, etc. to aid with training speed and convergence, are one of the hottest research directions
We still want to “see” our data, even if it’s in 784 dimensions.
Beyond the limits of our current senses: can we train our own neural networks to “see” or “feel” directly more of the data from our world? Feel the positive/negative vibe of the market ahead of everyone else, like seeing the coming of a rainstorm? The intuition is “Why Not?”
We can now build machines that do entire sentence-to-sentence translation from just having them read a lot of text in both languages.
They can even map images directly to appropriate captions. Isn’t this a kind of Understanding?
Meanwhile, Google’s DeepMind team is discovering that machines can indeed create their own algorithms
In this computational race, it’s helpful to put the various patterns of distributed computation in clear relief. Don’t be stuck thinking that MapReduce is all there is. MR excels when communication is too expensive, relative to other costs.
Although this approach still requires quite a bit of high-level concept hints, this work by Facebook AI hints at the day when our machines demonstrate “understanding”
The first work to try to tease out what is really essential in LSTMs encoding long-range temporal structures.