Notes from Deep Learning Summit 2015 London — Day 2

This post covers the second and final day of the Deep Learning Summit that took place in London on September 24th-25th 2015. You can find the first post here. Videos are also being posted on youtube.

After a welcome from Alison Lowndes of NVDIA, the day started with the startup session.

First up were Wally Trenholm, Founder & CEO, and Jason Cassidy, MD & Chief Science Officer, of Sightline Innovation, talking about The Commercialisation of Deep Learning. They started going after military customers (2011), then looked for other markets due to long military order process (5 year to order). They first took what had been developed in image analysis on Geo-Scale (Satellite, UAV) and applied it to agriculture. Then, to serve even more customers, they went from Geo-Scale to Macro-Scale images, addressing industrial problems (automated manufacturing quality control). Next they will go further down and apply their image analysis to Nano-Scale (genomic). There is a MLaaS (Machine Learning as a Service, term for which they hold the copyright) platform which will be released next month, with a server on site to collect and preprocess the data, and also provide reporting and dashboards, while algorithm training and prediction will be done on their cloud. In case you are looking, Clarify is hiring.

MLaaS according to Sightline

Next up was Paul Murphy, CEO of Clarify on Deep Learning & Speech: Adaptation, the Next Frontier, with some funny cartoonish slides. Clarify, started in London and now Texas based, provides an API that analyses audio and video making it searchable. The main issue with speech is adaptation, as also discussed by Sébastien Bratières in the last session of the first summit day. There are different adaptation problems, like speaker adaptation (ex. accents, speaker may not be native, while most of the training data is native and male), noise and tenuation (moving away from the microphone). The bleeding edge in speech recognition research is:

  • Hybrid work between DNNs and HMMs, bringing incremental improvements
  • Crossover of techniques from vision research to speech world
  • Context information will bring improvements, currently it is mainly addressed by research on dialogue but speech recognition will be there too
Paul Murphy illustrating the bleeding edge

Then came Appu Shaji, Head of R&D at EyeEm, talking about Deep Learning for Real Photography. EyeEm is a social network for photography, one of the goals Appu is to improve content discovery, helping photographers being found and selling more photos. He showed EyeVision, which is currently in early access. The engine assesses aesthetic quality of the photo and also tags them with 20k concepts, using data coming from both community and expertly curated tagging. They are using CNNs with word embeddings, based on these research papers Paper1 Paper2 Paper3.

EyeVision at work tagging a photo

John Overington, Director of Bioinformatics at Stratified Medical, followed with Artificial Intelligence in Drug Discovery. John said that currently drug discovery is extremely expensive and unpredictable, R&D expenses for a single approved drug range from 4 Billion$ to 12 Billion$ (source). He brought his experience on drug discovery to Stratified Medical, which is developing their own drug pipeline. The goal is to use AI to filter down potential molecules, accelerating discovery and reducing costs. They are building a knowledge graph using data from structured sources (molecule databases, vocabularies) and unstructured data (papers, patents, etc.), the latter being extracted with NLP techniques. They will also leverage new public datasets such as UK10K, the genome sequencing data of 10k people which will help uncover rare variants contributing to diseases. They are making progress, they achieved key milestones in a multimillion $ partnered Alzheimer’s program.

From unstructured and uncorrelated data to new drugs discovery

The last talk of the startup session was given by Marius Cobzarenco, Co-Founder & CTO of re:infer on Building Conversational Interfaces with Deep Nets. Marisu said they are building business bots that collect data from different systems (Slack, CRM, wiki, etc.) and are able to answer natural queries. Currently it is hard to understand intent and context, there is active research on embeddings done for example by Geoff Hinton on deep thoughts at Google. They are using CNNs to find embeddings, they found this DNNs to be faster to train compared to RNNs and at the same time gives good results. They also use DL for named entity recognition, you still need to extract entities to translate the intent into actions.

Sentence vectores capturing meaning

The second part of the morning was on Deep Learning Applications.

David Plans, CEO, and Davide Morelli, CTO, of BioBeats talked about Machine Intelligence for the Essential Self. Their initial work was on neural networks in creativity, releasing an app called Pulse that generates music based on the heartbeat. With the Pulse app they collected a large cardiovascular dataset, enhanced by information coming from sensor data (accelerometer, GPS, gyro, etc.). Now they pivoted and use this information to train models for people wellness. David, who also gave a terrific talk during the summit dinner the night before, said we are constantly under stress, as a result we live in sympathetic mode (Fight or Flight), with our body acting as if we were in a jungle facing a lion. In the long run it damages our health and may result in premature death, but with interventions we can be brought back to living in the much saner parasympathetic mode (Feed and Rest). Couple this with the fact that 70% of company healthcare spending is on preventable chronic diseases, they are bringing their system inside organisations, collaborating with Bupa, AXA and Samsung, to predict employee stress and fatigue levels and take action before it is too late. They also have a couple of public apps in beta testing:

  • Get on Up which aims at increasing people activity
  • Hear and now, a micromeditation app bringing some of the benefits of the MBSR (Mindfulness based stress reduction) program in a quick and practical way. I have tried it and felt immediate benefits, I definitely suggest you to join the beta program too

In the last part of the speech, Davide talked about their technology, where there are several challenges, like understanding if the stress is good (eg. you’re happy) or bad. There are some indicators, for example under bad stress the heartbeat becomes more regular, plus heartbeat information can be correlated with activity (ex. you are not moving and the heartbeat suddenly becomes regular) and info coming from social networks to label datasets. On top of that, they need to manage large datasets (each user generates 500 MB/day) without killing batteries and exhausting user data plans. Their solution is to extract features locally, send them to the server where models are trained, then send back the trained model and make predictions on the device. The API + SDK will be released by end of the year. They concluded saying that the most important open challenges are ethical, on bringing emotional intelligence to the algorithms so that interventions are beneficial for the user receiving them and don’t cause additional stress.

BioBeats pipeline to classify behaviour and trigger interventions

I then attended the parallel session on investing in AI. It started with a panel made of VCs, Nathan Benaich of Playfair Capital, John Henderson of White Star Capital, Simon King of Octopus Investments, together with Alex Dalyac, Co-Founder & CEO of Tractable, and moderated by Sally Davies of Financial Times. Most of the discussion has been on how to evaluate an AI startup, here are some aspects being considered:

  • Is AI a feature (ex. Netflix Recommender System) or the product itself?
  • Does AI addresses things people can do, reducing costs, or opens up new possibility doing things people can’t do?
  • What is the accuracy of the algorithms compared to existing solutions? Can they scale?
  • In general, it is difficult for a VC to assess the technology, the main things they focus on is the quality of the team and the business model. For technical evaluation they ask third parties.
  • If the startup is providing a product, having a demo is disproportionable valuable, even better having an open source component (ex. H2O), so that it can be easily tested and there are information on adoption.

They agreed that the acquisition of Deepmind by Google is a very important signal for Europe, before US companies tended to buy only US startups. This opens new exit possibilities for European startups, making them more attractive to VCs.


After a very good lunch break, the afternoon started with Alex Matei, mHealth Manager, and Ekaterina Volkova-Volkmar, Researcher, of Bupa on Deep Learning for Digital Health. Bupa is an international healthcare group, whose activity span from hospitals to company health insurance. They showed an interesting series of Proof of Concept:

  • Self Monitoring app. The goal is to help people keep a food diary. There are already projects going this way, like Project Adam by Microsoft and Im2Calories from Google. For the PoC, they used metamind.io API to classify the food in the images, then linked it with a nutritional database for calories and other nutrition facts and provided a healthy score (1–10) based on a nutrient profiling model.
  • Conversational interface. They wanted to develop an interface that can do speech conversations with the user. They used existing APIs that can do Speech to Text, Text to Speech and classify intent, like api.ai, wit.ai and IBM Watson. The PoC was tested in a Diabete Risk Assessment tool.
  • Behavioral Change. The PoC was an app to help quit smoking, with the goal of staying smoke free for 28 days. The app intervenes when the user has a high risk of start smoking again over. H2O binary classifier was used to predict the likelihood of failing the quit attempt

I really appreciated their approach, using available software / API for fast prototype development. They also showed some good practices, like defining at the start of each project the evaluation criteria for deciding which software / API to use (example criteria, what is the software/API potential to scale? How does the costs grow in case of large deployments?).

The pipeline behind the Self Monitoring app

Rodolfo Rosini, CTO of Weave.ai, came after. The presentation was not very informative, they seem to be in stealth mode. Their idea is to use contextual information to provide improved search. He also talked about aggregating corporate information and making it easily searchable, something similar to what re:infer was talking about in the morning.

Screenshots of contextual information inside an app

Joerg Bornschein, Global Scholar at CIFAR, followed with a talk on Combining Directed & Undirected Generative Models. Joerg talk was about unsupervised learning, where the progress has not been as impressive as in supervised learning and there are yet less real world application. Nevertheless, it might help us to understand how the brain works and it will enable new applications where machines generate content. Joerg presented his work on Training Bidirectional Helmholtz Machines (Paper). Helmholtz Machines (HMs) are made of a generative model coupled with an auxiliary model which performs approximate inference. Joerg presented a new way to train the HMs where probabilities of both models are interpreted as approximate inference distributions and the goal is to minimise the difference between the 2 distributions. He showed some examples of the algorithms in action, where they reconstruct digits and faces with missing parts.

Helmholtz Machines reconstructing digits with missing

The last talk of the summit was given by Marie-Francine Moens, Professor at KU Leuven, on Learning Representations for Language Understanding: Experiences from the MUSE Project. MUSE, which stands for Machine Understanding for interactive StorytElling, is working on algorithms that translate text into virtual worlds. Applications include rendering children’s stories and providing patient guidelines (ex. foreigners in a hospital) as 3D-virtual worlds. The algorithms play a double role:

  • At the sentence level, they recognise actions/events and their semantic roles (actor, patient, instrument, …)
  • At the discourse level, they recognise coreferent noun phrases, temporal relations between actions, spatial relations between objects

The main difficulties come from having very few annotated training datasets, for which they are researching into using other data sources like language models to improve results. There is also a lack of world knowledge (ex. practice with a spear -> the spear is held in the hand), so they are working on multimodal deep learning, using both images and phrases to acquire more knowledge.

A child story rendered in 3D rendering by the MUSE engine

That concluded the Deep Learning Summit London 2015. The organisation by the RE.WORK team (Nikita, Pip, Sophie) was great. The summit had a positive mix of industry and research talk and it was a terrific opportunity to network and get to know lots of interesting people in the Deep Learning field. Coming up are the San Francisco summit and then Europe again, highly recommended!

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.