Human-in-the-loop in machine learning: What is it and how does it work?

Arne Wolfewicz
Levity
Published in
4 min readSep 24, 2020

Introduction

Most machine learning models rely on data that has been prepared by humans. But the interaction between humans and machines doesn’t stop there — the most powerful systems are set up such that they allow both sides to interact continuously through a mechanism commonly referred to as “Human in the loop” (HITL).

We think that the term is a bit off-putting as it implies that the machine is calling the shots over humans — the opposite is true. However, this is not to fight but the attempt to clarify what this concept is all about and how you can use it to your advantage.

Bad term, good concept: Defining HITL

AI models don’t make predictions with 100% confidence as their “understanding” of data is largely based on statistics, which lacks the concept of absolute certainty as humans use it in practice. To account for this inherent uncertainty, some AI systems allow humans to directly interact with it.

As a consequence of this interaction (feedback), the machine keeps adjusting its “view of the world”. This works much like you would teach a child when it points at a cat saying “woof woof” — through repeated feedback (“No, that’s a cat”), the child will learn to connect the pieces.

With these two key terms in our books, we can formally define the concept:

HITL refers to systems that allow humans to give direct feedback to a model for predictions below a certain level of confidence.

In practice, you need to determine what level of confidence is acceptable for the process: If it is ok to have wrong predictions “slipping through”, you can set a rather low threshold — which, in turn, requires much less manual intervention through human labor. In other cases, you want to be sure that the system only records “correct” predictions.

“Why can’t we just use better algorithms to achieve higher confidence?”

The field of AI has seen great technological advances over the past year. However, it seems as if nothing beats a simple equation: More training data = better performance. The fundamental problem of this is that training data is hard to get by as it requires human expertise. And while there are many public datasets available, they generally don’t exist for your specific problems. Hence, they have to be created.

In order not to spend 3 years building a dataset, it is possible to already start training a model and using it sooner. In many cases, this already leads to considerable productivity gains.

Humans and machines, hand in hand

Human-in-the-loop aims to achieve what neither a human being nor a machine can achieve on their own. When a machine isn’t able to solve a problem, humans need to step in and intervene. This process results in the creation of a continuous feedback loop. With constant feedback, the algorithm learns and produces better results every time.

Typically, there are two machine learning algorithms where you can integrate HITL approaches. These include supervised and unsupervised learning.

In supervised learning, experts use labeled data sets to train algorithms to produce appropriate functions. These can then help to map new examples. Doing this will allow the algorithm to correctly determine functions for unlabeled data.

In unsupervised learning, unlabeled datasets are fed to the algorithms. Thus, they need to learn on their own to find a structure in the unlabeled data and memorize it accordingly. This falls under the human-in-the-loop deep learning approach.

Importance of HILT in ML

Improving the accuracy of rare datasets

As stated above, conventional machine learning models need a large number of labeled data points to produce accurate results. So, in instances where there is a lack of data, machine learning models are not very useful.

For example, if you look for specific information in a language that is only spoken by a few thousand people, the machine learning algorithm may not find any examples to learn from. So, a HITL approach helps to ensure the accuracy of rare data sets.

Or take healthcare as an example. In this sector, there is — and has always been — much debate on whether or not systems should be automated. A 2018 Stanford study found that a human-in-the-loop AI model works better than AI on its own or human doctors on their own.

These systems can help improve accuracy while maintaining human-level standards of work. This can apply to many industries and change the way people work all across the world!

Improving safety and precision

There are many situations in which you need the AI to deliver human-level precision to ensure safety and accuracy. For example, when manufacturing critical parts for vehicles or airplanes, the equipment must be up to standard. While machine learning can be helpful for inspections, it is still best to have the system monitored by humans.

How to install HITL AI systems in your company

There are many ways to install a HITL-enabled AI system in your company. Unless you want to read a whole book about it (or know someone who did), the easiest would be to use software that has this process already factored in.

Our automation software is built around this concept: We know that many companies simply don’t have sufficient data to reach near-perfect performance from the start but usually enough to get reasonable results. By adding HITL to the mix, this performance can be greatly enhanced — momentarily and in the long-run, as every single intervention counts for continuous training.

Originally published at levity.ai.

--

--

Arne Wolfewicz
Levity
Editor for

Reading to learn, writing to reflect. Growth @LevityAI. Say hi: @_ajascha