The Actuation Gap

At Attached (coming soon) we have been working on developing a platform for autonomous photo and video sharing. Our work in this area has spawned a myriad of questions, the answers to which, I believe, impact a variety of areas in machine learning and robotics — and more generally, how humans and machines should and will interact.

How can we predict the intent of media capture? Should this photo be automatically shared or hidden? These questions naturally led us to the area of computer vision and deep learning, where our objective was to discover patterns in the content of the media, and ultimately discriminate between privacy or shareability. After exhausting several approaches and exploring a few deep learning architectures, we began to quickly reach an upper bound on our model accuracy. And while we were generally content with how our embedded model generalized to users’ camera rolls, we quickly realized that we were missing an important dimension, that our line of questioning needed to be considered in a much larger context.

Towards Autonomy

One can argue that complete autonomy is the ultimate realization of AI — a robot that can sense its environment, know its place (localization and mapping), be taught via supervision but also learn by exploring and exploiting its environment (reinforcement learning), and, most importantly, actuate. In essence, the aim of these machines is to operate without human input in the physical world. They implement a variety of hardware including sensors and motors and their software implementations are focused on perception, planning, and motion. Canonical examples include UAVs and self-driving vehicles, but also the Roomba.

In a reduced sense, we can also deploy learning functionally, within the (Big Data) processing pipelines of existing products and applications, enabling these products and applications with automated decisioning. Implementations include everything from consumer recommendation engines to real-time advertising targeting. The decisions generated, while automatic, typically do not actuate in a physical sense, as in the case of robotics. Often, in these deployments, the time between data capture and model adaptation and re-training is not instantaneous and the notion of a reward cannot be ascertained.

Whatever the basis of learning, whether in robotics or large scale Big Data ML, we can aim to quantify our automated decision making via risk, or the cost of error in a learning system. Of course, this cost of error is different than the loss of an objective function — it quantifies the potential cost of a model decision as a function of its environment.

It is obvious that in environments such as self-driving cars and autonomous medicine (i.e. in the ICU), the cost of error is very high (i.e. human life). Whereas, in ad targeting and recommendation, the cost of error, while possibly having monetary impact for a business, is low, to the extent that we can test over large volumes of ads or recommendations, and thus, the per-decision error impact is minimal. Violations in a user’s privacy may be subjectively costly in autonomous media sharing and the app may be impacted by user churn or bad press.

Herein lies the actuation gap: the chasm that we we will need to cross in moving from human to autonomous control, in a variety of settings. I find that it is not only a computational or algorithmic gap but a behavioral one as well.

Robustness vs Performance

In the context of machine learning in the wild, Recht discusses robustness vs performance:

We’re making great progress in machine learning systems, and we’re trying to push their tenets into many different kinds of production systems. But we’re doing so with limited knowledge about how well these things are going to perform in the wild.
This isn’t such a big deal with most machine learning algorithms that are currently very successful. If image search returns an outlier, it’s often funny or cute. But when you put a machine learning system in a self-driving car, one bad decision can lead to serious human injury. Such risks raise the stakes for the safe deployment of learning systems.
In engineering design problems, robustness and performance are competing objectives. Robustness means having repeatable behavior no matter what the environment is doing. On the other hand, you want this behavior to be as good as possible. There are always some performance goals you want the system to achieve. Performance is a little bit easier to understand — faster, more scalable, higher accuracy, etc. Performance and robustness trade off with each other: the most robust system is the one that does nothing, but the highest performing systems typically require sacrificing some degree of safety.

While the conflicting goals of robustness vs performance is an important research area, we look at this problem from a different lens: human interaction and trust-building. As the cost of error, and thus risk, in these systems increases, we tend to have added pressure at the boundary of the human-computer-interaction. This pressure is caused by uncertainty and generally relieved by enabling native, human-in-the-loop interactions as well as transparency.

These two modalities — HITL and transparency — allow for human-machine trust building and reinforcement.

Human-in-the-loop

A recent article in ComputerWorld title Why human-in-the-loop computing is the future of machine learning sums this concept up nicely:

…Human-computer interaction is much more important for artificial intelligence than we ever thought. In each case: chess, driving, Facebook and ATMs, making sure computers and humans work well together is critical for all of these applications to work. Notably, however, there’s a different interface between the computer and the human in each but it’s the pairing of humans and machine–not the supremacy of one over the other–that yields the best results.
Artificial intelligence is here and it’s changing every aspect of how business functions. But it’s not replacing people one job function at a time. It’s making people in every job function more efficient by handling the easy cases and watching and learning from the hard cases. Which is to say: We don’t wake up one day to find self driving cars — we slowly cede driving functions one piece at a time.

One key element in this article is the ceding of discrete, human functions to autonomous systems one function at a time. The idea being that, given an upper bound on the achievable accuracy that can be obtained by these systems, they would benefit from learning from humans in situations where the cost of error is greater. Thus, these two systems — humans and machines — form a symbiotic relationship, for which the end goal is a complete subjugation of human tasks to an autonomous machine. I argue that the added and necessary side-affect of this symbiotic relationship is the building of trust, which in many systems can be augmented by transparency and explainability.

Transparency and Trust

Transparency is the ability of a learning system to explain itself. In the case of traditional, linear ML algorithms, this was a much easier proposition. For instance, in the case of Logistic Regression, we could bring forth the rank-ordered list of loadings on the various coefficients as a way to explain where the impact is in a inference task. With Deep Learning this becomes more complex, given the highly non-linear relationship between the input volumes and the output classifications (as in the case of convolutional neural networks). There has been some work on displaying the outputs signals of interim layers as well as, most recently, generative networks (i.e. GANs), which may help to provide deeper insight into what these machines see, and how they discriminate between classes. It will be interesting to see how this research evolves towards the goal of explainability, which is useful in a variety of settings.

Typical architecture of a CNN — filters learn key features in images
Output from a GAN of face data

Whatever technique is used, what we have learned at Attached thus far in our journey is that communicating explainability of decision making and allowing for HITL feedback enables a gradual building of trust between the human and the machine, facilitating both successful adoption and continued reinforcement of this relationship. It also provides a new exhaust of data from which to continue to improve these learning systems towards the goal of making them truly autonomous — ultimately filling the actuation gap.