The Actuation Gap

Albert Azout
Feb 20, 2017 · 6 min read
Image for post
Image for post

At Attached (coming soon) we have been working on developing a platform for autonomous photo and video sharing. Our work in this area has spawned a myriad of questions, the answers to which, I believe, impact a variety of areas in machine learning and robotics — and more generally, how humans and machines should and will interact.

How can we predict the intent of media capture? Should this photo be automatically shared or hidden? These questions naturally led us to the area of computer vision and deep learning, where our objective was to discover patterns in the content of the media, and ultimately discriminate between privacy or shareability. After exhausting several approaches and exploring a few deep learning architectures, we began to quickly reach an upper bound on our model accuracy. And while we were generally content with how our embedded model generalized to users’ camera rolls, we quickly realized that we were missing an important dimension, that our line of questioning needed to be considered in a much larger context.

Towards Autonomy

One can argue that complete autonomy is the ultimate realization of AI — a robot that can sense its environment, know its place (localization and mapping), be taught via supervision but also learn by exploring and exploiting its environment (reinforcement learning), and, most importantly, actuate. In essence, the aim of these machines is to operate without human input in the physical world. They implement a variety of hardware including sensors and motors and their software implementations are focused on perception, planning, and motion. Canonical examples include UAVs and self-driving vehicles, but also the Roomba.

In a reduced sense, we can also deploy learning functionally, within the (Big Data) processing pipelines of existing products and applications, enabling these products and applications with automated decisioning. Implementations include everything from consumer recommendation engines to real-time advertising targeting. The decisions generated, while automatic, typically do not actuate in a sense, as in the case of robotics. Often, in these deployments, the time between data capture and model adaptation and re-training is not instantaneous and the notion of a cannot be ascertained.

Whatever the basis of learning, whether in robotics or large scale Big Data ML, we can aim to quantify our automated decision making via , or the in a learning system. Of course, this cost of error is different than the loss of an objective function — it quantifies the potential cost of a model decision as a function of its environment.

Image for post
Image for post

It is obvious that in environments such as self-driving cars and autonomous medicine (i.e. in the ICU), the cost of error is very high (i.e. human life). Whereas, in ad targeting and recommendation, the cost of error, while possibly having monetary impact for a business, is low, to the extent that we can test over large volumes of ads or recommendations, and thus, the per-decision error impact is minimal. Violations in a user’s privacy may be costly in autonomous media sharing and the app may be impacted by user churn or bad press.

Herein lies the actuation gap: the chasm that we we will need to cross in moving from human to autonomous control, in a variety of settings. I find that it is not only a computational or algorithmic gap but a behavioral one as well.

Robustness vs Performance

In the context of machine learning , Recht discusses robustness vs performance:

While the conflicting goals of robustness vs performance is an important research area, we look at this problem from a different lens: human interaction and trust-building. As the cost of error, and thus risk, in these systems increases, we tend to have added pressure at the boundary of the human-computer-interaction. This pressure is caused by uncertainty and generally relieved by enabling native, human-in-the-loop interactions as well as transparency.

These two modalities — HITL and transparency — allow for human-machine trust building and reinforcement.


A recent article in ComputerWorld title sums this concept up nicely:

One key element in this article is the of discrete, human functions to autonomous systems . The idea being that, given an upper bound on the achievable accuracy that can be obtained by these systems, they would benefit from learning from humans in situations where the cost of error is greater. Thus, these two systems — humans and machines — form a symbiotic relationship, for which the end goal is a complete subjugation of human tasks to an autonomous machine. I argue that the added and necessary side-affect of this symbiotic relationship is the building of trust, which in many systems can be augmented by transparency and explainability.

Transparency and Trust

Transparency is the ability of a learning system to . In the case of traditional, linear ML algorithms, this was a much easier proposition. For instance, in the case of Logistic Regression, we could bring forth the rank-ordered list of loadings on the various coefficients as a way to explain where the impact is in a inference task. With Deep Learning this becomes more complex, given the highly non-linear relationship between the input volumes and the output classifications (as in the case of convolutional neural networks). There has been some work on displaying the outputs signals of interim layers as well as, most recently, generative networks (i.e. GANs), which may help to provide deeper insight into what these machines see, and how they discriminate between classes. It will be interesting to see how this research evolves towards the goal of explainability, which is useful in a variety of settings.

Image for post
Image for post
Typical architecture of a CNN — filters learn key features in images
Image for post
Image for post
Output from a GAN of face data

Whatever technique is used, what we have learned at Attached thus far in our journey is that communicating of decision making and allowing for HITL feedback enables a gradual building of trust between the human and the machine, facilitating both successful adoption and continued reinforcement of this relationship. It also provides a new exhaust of data from which to continue to improve these learning systems towards the goal of making them truly autonomous — ultimately filling the actuation gap.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store