The state of AI in 2019: Historical trends and research in voice, vision, and robotics.

Published in

Future Labs

7 min readDec 20, 2018

At the recent AI Summit New York, the Future Labs organized and ran the AI Research to Real World track. As part of the track, we organized speakers from across academia and industry. As part of the day, I presented on the topic and wanted to write about why we named our track AI Research to Real World and why you should care.

Your AI predictions are based on intuition

Many organizations put together predictions on how AI will impact business, life, and the future of technology. Unfortunately, AI is as hard to predict as the impact of electricity was when it was first commercially deployed. Certainly, there were aspects of it one could predict — there will be lights everywhere and at some point, it would be inexpensive enough to be an afterthought — but no one could have predicted cell phones, let all phones, which were developed 100 years after Benjamin Franklin’s famous key experiment.

As a result, of our inability to predict the future means we tend to rely heavily on our intuition and gut to make a decision. It’s not our fault, we are just limited by our current knowledge and our linguistic description of that knowledge. As an experiment, consider explaining Generative Adversarial Networks (GANs) to a version of yourself 15 years ago. Instead, we should consider two variables in our decision, how we can learn from historical technology trends from the past, adjusted for time and trends in research, adjusted for reality.

Historical technology trend — Electricity

We’ve all seen, or heard, Andrew Ng’s famous quote that AI is the new electricity. But how many of us devoted enough time to draw a parallel between AI and Electricity? Without going too deep, I’ll try to draw those parallels and let you make your own conclusions.

There are a number of great writeups on electricity, including this timeline, but I’ll sum up my lessons learned from the business applications.

When electricity was first introduced to factories, it was an adaptation of the technology over their current infrastructure, steam. While it augmented the capabilities of the factories, they were still limited by their infrastructure. The result was similar to what we see today with leading companies. AI is augmenting companies, notice how many are quick to even make that nomenclature change when asked about how AI will support their businesses. It was years before factories were rebuilt around electricity to maximize it’s potential, a process that few businesses around the world have started to implement, but will be the true turning point in AI’s potential impact on businesses. Until then the predictions are assumptions, at least until we can see how such deployments will impact industry and society.

Current State of AI Research

When Yann LeCun first showcased how Convolutional Neural Nets (CNNs) could be used for character recognition, the technology was not generalized enough to make waves. It took years for the potential of the technology to be realized and for Yann to gain the recognition he deserved. For decades before and the years since academia has been the breeding ground of AI research. Every major tech company in the world has been tapping into academia for AI research and talent, but our media publications publish the vast majority of their predictions from theories that are not grounded in research. While reading research papers isn’t feasible for everyone, they contain the key to AI progress and predictions. To help, I’m going to dive into Voice, Language, Vision, and Robotics research and share what I presented at the AI Summit on expectations of progress from cutting-edge research.

Vision

To understand the problem of vision you must understand one concept, World Models. Without reading the entire paper, which is incredible, translate how humans develop a mental model of the world based on our experiences to be able to function within the world into a model for a machine. The reason this paper is important is that it highlights one of the many AI barriers, especially in vision. For our vision to work, we have to have a robust model of the world, enough to be able to have instinctual behaviors.

As of today, we are unable to do two things; Build general models for computer vision applications, the data set to understand the entire world; and transfer learning from one model or application to another. As a result, computer vision applications are very narrow in scope. They are extraordinary in their performance within those applications, but you cannot take a model trained for an autonomous driving application and use it to detect tumors in CT scans. To be fair, even people need specialization to tell the difference, but our general world models mean we need far fewer data. We wouldn’t necessarily be more accurate, but we need far fewer data to be useful for the application.

This is another fairly digestible paper on deep learning for computer vision and one on Theoretical Guarantees of Transfer Learning.

Read more about the vision segment from the Future Labs AI Summer stage here

Voice/Language

Voice and Language progress is important for AI. Without it, we are unable to interact whatever is the intelligence happening under the hood. Technology in this space has come a long, long way in many years but is still limited to providing answers, not having a freeform conversation. Part of the problem is that having a conversation is hard, much harder than visually identifying objects. Besides having a shared world mode, the below process occurs in a normal conversation without us realizing it;

The complexity of that process, presents an unpopular reality, the current state of the art systems are still a long way away from engaging in truly natural everyday conversation with people. Besides the complexity of the above process, this happens for two reasons;

· In natural conversation, intents and topics change based on the interest or the interactors and the state of the conversation.

· The conversations are also highly path dependent. Two sets of interactors having a similar background and knowledge could still have two completely different conversations.

This was further proven in Amazon’s Alexa Prize competition a few months ago. Here’s the research paper for those that want to dive deep into it, but the key outcomes are; All 15 socialbots had an average customer rating of 2.87 (out of a possible 5) along with conversation duration 1:35 min (median) & 5:43 min (90th percentile) by the end of the semifinal phase. Conversation duration of finalists across entire competition was 1:53 min (median) and 8:08 min (90th percentile), improving 14.0% and 56.8% respectively from competition start, with 11 turns (median) per conversation. Let me rephrase, 8 min was the ceiling for the current state of the art technology after having learned from the previous conversation of 5:43 min long. That may seem great, but the problem gets exponentially harder with each turn. Not to worry, research in this field is extensive. Some of my favorite are listed below, pick one you particularly are interested in and dive into it!

· Conversational automatic speech recognition for free-form multiturn speech and dialogues
· Commonsense reasoning for understanding concepts; context modeling for relating past concepts
· Response generation and natural language generation for generating relevant, grammatical, and nongeneric responses
· Sentiment detection for systematically identifying, extracting, quantifying, and studying affective states and for handling sensitive content (such as profanity, inflammatory opinions, inappropriate jokes, hate speech detection), driving quality conversations, and
· Conversational experience design for maintaining a great experience for the interactors.

Read more about the Voice/Language segments from the Future Labs AI Summit stage here.

Robotics

While not in the same vein as voice, language or vision, robotics depends on them to function well for the application that I want to cover, autonomous driving. Cars may not look like robots, but as early as 2012, the notion was presented as the first robot that people will learn to interact with. Something about a lack of a face makes it far less hostile.

While these self-driving systems are being deployed and they work, there is a simultaneous growth in the number of sensors necessary to make them work. This presents an interesting but unintended problem since the technology is far from reaching maturity. In order to increase the accuracy of the systems, more sensors are necessary. With more sensors, the software (models) become more complex and the probability of errors increases. The below graphic from this paper clearly displays the multidimensional problem that is being solved in real time by these machines.

Credit: Simon Hecker, Dengxin Dai, and Luc Van Gool

One additional point to make, there is also growth in third party hardware manufacturers of parts for robotics, complicating the multidimensional problem I mentioned above.

At least we’re on the right track (pun intended!). Below are some interesting research areas to follow.

· Fast Reinforcement Learning
· Fast Imitation Learning
· Leverage Simulation
· Model-based RL
· Long Horizon / Hierarchical Reasoning
· Safe and Lifelong Learning
· Value Alignment

While this was a dense review, there is on the main takeaway, when in doubt with hype fall back on two things; Historical technology trend and Current Research, everything else is just noise.

The state of AI in 2019: Historical trends and research in voice, vision, and robotics.

Written by Steven Kuyan