Thoughts from ICML

David Williams
Entrepreneur First
Published in
3 min readJul 5, 2016

The 33rd International Conference on Machine Learning, held recently in New York, saw a doubling of both submissions and attendees on the previous year. This kind of exponential growth is appropriate given the incredible progress the field itself has made of the last few years. Entrepreneur First sent along two of our team to ICML as part of our efforts to continue supporting founders who are pushing the boundaries of technology.

Magic Pony Technology, one of the startups formed at EF in 2014, has over 20 patents in using machine learning for image enhancement and video compression, and their recent acquisition by Twitter is a clear indicator of how highly-valued this kind of technology is.

The ICML invited talk by Fei Fei Li, director of the Stanford AI Lab and Stanford Vision Lab, demonstrated really well the pace of progress that has been achieved. In 2011, state-of-the-art computer vision systems could achieve around 74% accuracy in the ImageNet classification task (classifying 1.5 million images into 1000 categories, such as ‘cat’ or ‘plane’), and results had been improving only by 1 or 2 percent year on year. By 2014, the accuracy had skyrocketed to 93%. Today, the best systems can match, or even out-perform, humans on this task.

So — what changed between 2011 and 2014? The answer is the rise of Deep Learning — the extension of classical machine learning techniques, such as artificial neural networks, by employing more complex model structure and training them on much larger datasets. This has been enabled by the wider availability of these large datasets and the development of GPU-based algorithms to train the models themselves. Convolutional Neural Networks, for example, use a complex structure inspired by visual cortex in animals and are, aptly, most generally applied to computer vision tasks.

An example of the structure of a Convolutional Neural Network

By combining Convolutional Networks with other techniques (such as Recurrent Neural Networks), Fei Fei Li was able to demonstrate in her talk a potential application of Deep Learning for a new kind of task: dense captioning. Dense captioning goes far beyond simple recognition of objects — the network learns to generate novel, complex, descriptions (e.g. ‘white laptop on table’) for up to 300 objects in any provided image, as well as locating where those objects by surrounding them in a box. The technology is far from perfect at this stage, but, even so, the progress so far is impressive.

Example of Dense Captioning (DenseCap: Fully Convolutional Localization Networks for Dense Captioning — Justin Johnson, Andrej Karpathy, Li Fei-Fei. Nov 2015)

Advances in Deep Learning, such as the Dense Captioning technique mentioned above, open up a range of high impact products. The challenge is in commercialisation though, and this is an area where we can really provide support at EF. We saw this with Tractable, another EF-incubated startup. With the support of our Venture Partners and Science Partners, we helped Razvan Ranca, who had an academic background in using Machine Learning for document recovery, find a co-founder, Alex Dalyac, and build a startup founded on his expertise. Tractable now uses image recognition and text analytics to provide automation of expert tasks for enterprise businesses.

Applications for our 2017 programmes in London and Singapore are open now and close in September, so apply now and maximize the impact of your research.

--

--