Up to Speed on Deep Learning: August Update, Part 2

Continuing our series of deep learning updates, we pulled together some of the awesome resources that have emerged since our last post on August 16th. In case you missed it, here’s the August update (part 1), here’s the July update (part 2), here’s the July update (part 1), here’s the June update, and here’s the original set of 20+ resources we outlined in April. As always, this list is not comprehensive, so let us know if there’s something we should add, or if you’re interested in discussing this area further.

Learning to Segment by Piotr Dollar of Facebook. Piotr explains Facebook’s efforts and progress in image segmentation, as well as highlighting use cases and explaining the importance of such advancements. When humans look at an image, they can identify objects down to the last pixel. At Facebook AI Research (FAIR) we’re pushing machine vision to the next stage — our goal is to similarly understand images and objects at the pixel level.

Google Brain begins accepting applications to its Residency program on September 1st. Google’s Jeff Dean will deliver a Youtube livestream to describe the Google Brain team and the Residency program. The Google Brain Residency Program is a one-year intensive residency program focused on Deep Learning. Residents will have the opportunity to conduct cutting-edge research and work alongside some of the most distinguished deep learning scientists within the Google Brain team. To learn more about the team visit g.co/brain. Consider applying here when applications open.

Text summarization with TensorFlow by Peter Liu of Google Brain. The Brain team open sources their TensorFlow model code for generating news headlines on a large dataset frequently used for summarization tasks. Peter explains two approaches — extractive and abstractive summarization, describes the model, and highlights areas of future interest.

Google Brain robot datasets by Sergey Levine, Chelsea Finn, and Laura Dows. The Google Brain team releases massive robotics datasets from two of their recent papers to further drive the field forward. Their grasping dataset contains roughly 650,000 examples of robot grasping attempts (original paper here). Their push dataset contains roughly 59,000 examples of robot pushing motions, including one training set (train) and two test sets of previously seen (testseen) and unseen (testnovel) objects (original paper here).

End-to-End Deep Learning for Self-Driving Cars by NVIDIA. The autonomous car team at NVIDIA describes their end-to-end approach to self-driving vehicles, using convolutional neural networks (CNNs) to map the raw pixels from a front-facing camera to the steering commands for a self-driving car. Original paper here.

NIPS list of accepted papers. The 2016 Conference on Neural Information Processing Systems, cited as the top machine learning conference, takes place from December 5th through 10th in Barcelona, Spain. The list of accepted papers highlights some of the bleeding-edge machine learning & AI research that will be presented, as well as the researchers & practitioners driving the field forward who may be present. Consider attending this year — details here.

Combining satellite imagery and machine learning to predict poverty by Neal Jean et al of Stanford. Nighttime lighting is a rough proxy for economic wealth, and nighttime maps of the world show that many developing countries are sparsely illuminated. Jean et al. combined nighttime maps with high-resolution daytime satellite images. With a bit of machine-learning wizardry, the combined images can be converted into accurate estimates of household consumption and assets, both of which are hard to measure in poorer countries. Furthermore, the night- and day-time data are publicly available and nonproprietary.

Speech Is 3x Faster than Typing for English and Mandarin Text Entry on Mobile Devices by Sherry Ruan et al of Stanford & Baidu. Researchers evaluate Deep Speech 2, a deep learning-based speech recognition system, assessing that the system makes English text input 3.0X faster, and Mandarin Chinese input 2.8X faster than standard keyboard typing. The error rates were also dramatically reduced, and the results further highlight the potential & strength of speech interfaces.


By Isaac Madan. Isaac is an investor at Venrock (email). If you’re interested in deep learning or working on something in this area, we’d love to hear from you.

Subscribe to our email newsletter here. Requests for Startups is a newsletter of entrepreneurial ideas & perspectives by investors, operators, and influencers.